Compressors

Altay Sansal

Oct 20, 2025

2 min read

Dataset Compression

MDIO relies on numcodecs for data compression. We provide good defaults based on opinionated and limited heuristics for each compressor for various energy datasets. However, using these data models, the compression can be customized.

Numcodecs is a project that a convenient interface to different compression libraries. We selected the Blosc and ZFP compressors for lossless and lossy compression of energy data.

Blosc

A high-performance compressor optimized for binary data, combining fast compression with a byte-shuffle filter for enhanced efficiency, particularly effective with numerical arrays in multi-threaded environments.

For more details about compression modes, see Blosc Documentation.

Blosc

Data Model for Blosc options.

ZFP

ZFP is a compression algorithm tailored for floating-point and integer arrays, offering lossy and lossless compression with customizable precision, well-suited for large scientific datasets with a focus on balancing data fidelity and compression ratio.

For more details about compression modes, see ZFP Documentation.

ZFP

Data Model for ZFP options.

Model Reference

Blosc
pydantic model mdio.builder.schemas.compressors.Blosc

Data Model for Blosc options.

Show JSON schema
{
   "title": "Blosc",
   "description": "Data Model for Blosc options.",
   "type": "object",
   "properties": {
      "name": {
         "default": "blosc",
         "description": "Name of the compressor.",
         "title": "Name",
         "type": "string"
      },
      "cname": {
         "$ref": "#/$defs/BloscCname",
         "default": "zstd",
         "description": "Compression algorithm name."
      },
      "clevel": {
         "default": 5,
         "description": "Compression level (integer 0\u20139)",
         "maximum": 9,
         "minimum": 0,
         "title": "Clevel",
         "type": "integer"
      },
      "shuffle": {
         "anyOf": [
            {
               "$ref": "#/$defs/BloscShuffle"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Shuffling mode before compression."
      },
      "typesize": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The size in bytes that the shuffle is performed over.",
         "title": "Typesize"
      },
      "blocksize": {
         "default": 0,
         "description": "The size (in bytes) of blocks to divide data before compression.",
         "title": "Blocksize",
         "type": "integer"
      }
   },
   "$defs": {
      "BloscCname": {
         "description": "Enum for compression library used by blosc.",
         "enum": [
            "lz4",
            "lz4hc",
            "blosclz",
            "zstd",
            "snappy",
            "zlib"
         ],
         "title": "BloscCname",
         "type": "string"
      },
      "BloscShuffle": {
         "description": "Enum for shuffle filter used by blosc.",
         "enum": [
            "noshuffle",
            "shuffle",
            "bitshuffle"
         ],
         "title": "BloscShuffle",
         "type": "string"
      }
   },
   "additionalProperties": false
}

field blocksize: int = 0

The size (in bytes) of blocks to divide data before compression.

field clevel: int = 5

Compression level (integer 0–9)

Constraints:
  • ge = 0

  • le = 9

field cname: BloscCname = BloscCname.zstd

Compression algorithm name.

field name: str = 'blosc'

Name of the compressor.

field shuffle: BloscShuffle | None = None

Shuffling mode before compression.

field typesize: int | None = None

The size in bytes that the shuffle is performed over.

ZFP
pydantic model mdio.builder.schemas.compressors.ZFP

Data Model for ZFP options.

Show JSON schema
{
   "title": "ZFP",
   "description": "Data Model for ZFP options.",
   "type": "object",
   "properties": {
      "name": {
         "default": "zfp",
         "description": "Name of the compressor.",
         "title": "Name",
         "type": "string"
      },
      "mode": {
         "$ref": "#/$defs/ZFPMode"
      },
      "tolerance": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Fixed accuracy in terms of absolute error tolerance.",
         "title": "Tolerance"
      },
      "rate": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Fixed rate in terms of number of compressed bits per value.",
         "title": "Rate"
      },
      "precision": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Fixed precision in terms of number of uncompressed bits per value.",
         "title": "Precision"
      },
      "writeHeader": {
         "default": true,
         "description": "Encode array shape, scalar type, and compression parameters.",
         "title": "Writeheader",
         "type": "boolean"
      }
   },
   "$defs": {
      "ZFPMode": {
         "description": "Enum for ZFP algorithm modes.",
         "enum": [
            "fixed_rate",
            "fixed_precision",
            "fixed_accuracy",
            "reversible"
         ],
         "title": "ZFPMode",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "mode"
   ]
}

field mode: ZFPMode [Required]
field name: str = 'zfp'

Name of the compressor.

field precision: int | None = None

Fixed precision in terms of number of uncompressed bits per value.

field rate: float | None = None

Fixed rate in terms of number of compressed bits per value.

field tolerance: float | None = None

Fixed accuracy in terms of absolute error tolerance.

field writeHeader: bool = True

Encode array shape, scalar type, and compression parameters.


class mdio.builder.schemas.compressors.ZFPMode

Enum for ZFP algorithm modes.

FIXED_RATE = 'fixed_rate'
FIXED_PRECISION = 'fixed_precision'
FIXED_ACCURACY = 'fixed_accuracy'
REVERSIBLE = 'reversible'
property int_code: int

Return the integer code of ZFP mode.