Create and Register a Custom Template¶
Altay Sansal
Oct 20, 2025
7 min read
Warning
Most SEG-Y files correspond to standard seismic data types or field configurations. We recommend using the built-in templates from the registry whenever possible. Create a custom template only when your file is unusual and cannot be represented by existing templates. In many cases, you can simply customize the SEG-Y header byte mapping during ingestion without defining a new template.
In this tutorial we will walk through the Template Registry and show how to:
Discover available templates in the registry
Define and register your own template
Build a dataset model and convert it to an Xarray Dataset using your custom template
If this is your first time with MDIO, you may want to skim the Quickstart first.
What is a Template and a Template Registry?¶
A template defines how an MDIO dataset is structured: names of dimensions and coordinates, the default variable name, chunking hints, and attributes to be stored. Since many seismic datasets share common structures (e.g., 3D post-stack, 2D post-stack, pre-stack CDP/shot, etc.), MDIO ships with a pre-populated template registry and APIs to fetch or register templates.
Fetching a template from it returns a copied instance you can freely customize without affecting others.
from mdio.builder.template_registry import get_template
from mdio.builder.template_registry import get_template_registry
from mdio.builder.template_registry import list_templates
registry = get_template_registry()
registry # pretty HTML in notebooks
TemplateRegistry
| Template (16) | Default Var | Dimensions | Chunk Sizes | Coords |
|---|---|---|---|---|
| PostStack2DDepth | amplitude | cdp, depth | 1024×1024 | cdp_x, cdp_y |
| PostStack2DTime | amplitude | cdp, time | 1024×1024 | cdp_x, cdp_y |
| PostStack3DDepth | amplitude | inline, crossline, depth | 128×128×128 | cdp_x, cdp_y |
| PostStack3DTime | amplitude | inline, crossline, time | 128×128×128 | cdp_x, cdp_y |
| PreStackCdpAngleGathers2DDepth | amplitude | cdp, angle, depth | 16×64×1024 | cdp_x, cdp_y |
| PreStackCdpAngleGathers2DTime | amplitude | cdp, angle, time | 16×64×1024 | cdp_x, cdp_y |
| PreStackCdpAngleGathers3DDepth | amplitude | inline, crossline, angle, depth | 8×8×32×512 | cdp_x, cdp_y |
| PreStackCdpAngleGathers3DTime | amplitude | inline, crossline, angle, time | 8×8×32×512 | cdp_x, cdp_y |
| PreStackCdpOffsetGathers2DDepth | amplitude | cdp, offset, depth | 16×64×1024 | cdp_x, cdp_y |
| PreStackCdpOffsetGathers2DTime | amplitude | cdp, offset, time | 16×64×1024 | cdp_x, cdp_y |
| PreStackCdpOffsetGathers3DDepth | amplitude | inline, crossline, offset, depth | 8×8×32×512 | cdp_x, cdp_y |
| PreStackCdpOffsetGathers3DTime | amplitude | inline, crossline, offset, time | 8×8×32×512 | cdp_x, cdp_y |
| PreStackCocaGathers3DDepth | amplitude | inline, crossline, offset, azimuth, depth | 8×8×32×1×1024 | cdp_x, cdp_y |
| PreStackCocaGathers3DTime | amplitude | inline, crossline, offset, azimuth, time | 8×8×32×1×1024 | cdp_x, cdp_y |
| PreStackShotGathers2DTime | amplitude | shot_point, channel, time | 16×32×2048 | source_coord_x, source_coord_y, group_coord_x, group_coord_y |
| PreStackShotGathers3DTime | amplitude | shot_point, cable, channel, time | 8×1×128×2048 | source_coord_x, source_coord_y, group_coord_x, group_coord_y, gun |
We can list all registered templates and get a list as well.
list_templates()
['PostStack2DTime',
'PostStack2DDepth',
'PostStack3DTime',
'PostStack3DDepth',
'PreStackCdpOffsetGathers3DTime',
'PreStackCdpOffsetGathers2DTime',
'PreStackCdpAngleGathers3DTime',
'PreStackCdpAngleGathers2DTime',
'PreStackCdpOffsetGathers3DDepth',
'PreStackCdpOffsetGathers2DDepth',
'PreStackCdpAngleGathers3DDepth',
'PreStackCdpAngleGathers2DDepth',
'PreStackCocaGathers3DTime',
'PreStackCocaGathers3DDepth',
'PreStackShotGathers2DTime',
'PreStackShotGathers3DTime']
Defining a Minimal Custom Template¶
To define a custom template, subclass AbstractDatasetTemplate and set:
_name: a public name for the template_dim_names: names for each axis of your data variable (the last axis is the trace/time or trace/depth axis)_physical_coord_namesand_logical_coord_names: optional additional coordinate variables to store along the spatial grid_load_dataset_attributes(): optional attributes stored at the dataset level
Below we create a special template that can hold interval velocity field with multiple anisotropy parameters for a depth seismic volume.
The dimensions, dimension-coordinates and non-dimension coordinates will automatically get created using the method
from the base class. However, since we want more variables, we override _add_variables to add them.
from mdio.builder.schemas import compressors
from mdio.builder.schemas.chunk_grid import RegularChunkGrid
from mdio.builder.schemas.chunk_grid import RegularChunkShape
from mdio.builder.schemas.dtype import ScalarType
from mdio.builder.schemas.v1.variable import VariableMetadata
from mdio.builder.templates.base import AbstractDatasetTemplate
class AnisotropicVelocityTemplate(AbstractDatasetTemplate):
"""A custom template that has unusual dimensions and coordinates."""
def __init__(self, data_domain: str = "depth") -> None:
super().__init__(data_domain)
# Dimension order matters; the last dimension is the depth
self._dim_names = ("inline", "crossline", self.trace_domain)
# Additional coordinates: these are added on top of dimension coordinates
self._physical_coord_names = ("cdp_x", "cdp_y")
self._var_chunk_shape = (128, 128, 128)
self._units = {}
@property
def _name(self) -> str: # public name for the registry
return "AnisotropicVelocity3DDepth"
@property
def _default_variable_name(self) -> str: # public name for the registry
return "velocity"
def _load_dataset_attributes(self) -> dict:
return {"surveyType": "3D", "gatherType": "line"}
def _add_variables(self) -> None:
"""Add the variables including default and extra."""
for name in ["velocity", "epsilon", "delta"]:
chunk_grid = RegularChunkGrid(configuration=RegularChunkShape(chunk_shape=self.full_chunk_shape))
unit = self.get_unit_by_key(name)
self._builder.add_variable(
name=name,
dimensions=self._dim_names,
data_type=ScalarType.FLOAT32,
compressor=compressors.Blosc(cname=compressors.BloscCname.zstd),
coordinates=self.physical_coordinate_names,
metadata=VariableMetadata(chunk_grid=chunk_grid, units_v1=unit),
)
AnisotropicVelocityTemplate()
AnisotropicVelocityTemplate
Data Domain: depth
Default Variable: velocity
Default Variable Units: —
Dimensions (3)
| Name | Size | Chunk Sizes | Units | Spatial |
|---|---|---|---|---|
| inline | — | 128 | — | ✓ |
| crossline | — | 128 | — | ✓ |
| depth | — | 128 | — |
Coordinates (2)
| Name | Type | Units |
|---|---|---|
| cdp_x | Physical | — |
| cdp_y | Physical | — |
Registering the Custom Template¶
The registry returns a deep copy of the template on every fetch. To make the template discoverable by name, register it first, then retrieve it with get_template.
from mdio.builder.template_registry import register_template
register_template(AnisotropicVelocityTemplate())
print("Registered:", "AnisotropicVelocity3DDepth" in list_templates())
custom_template = get_template("AnisotropicVelocity3DDepth")
custom_template
Registered: True
AnisotropicVelocityTemplate
Data Domain: depth
Default Variable: velocity
Default Variable Units: —
Dimensions (3)
| Name | Size | Chunk Sizes | Units | Spatial |
|---|---|---|---|---|
| inline | — | 128 | — | ✓ |
| crossline | — | 128 | — | ✓ |
| depth | — | 128 | — |
Coordinates (2)
| Name | Type | Units |
|---|---|---|
| cdp_x | Physical | — |
| cdp_y | Physical | — |
You can also set units at any time. For this demo we’ll set metric units. The spatial units will be inferred from the SEG-Y binary header during ingestion, but we can override them here. Ingestion will honor what is in the template.
from mdio.builder.schemas.v1.units import LengthUnitModel
from mdio.builder.schemas.v1.units import SpeedUnitModel
custom_template.add_units(
{
"depth": LengthUnitModel(length="m"),
"cdp_x": LengthUnitModel(length="m"),
"cdp_y": LengthUnitModel(length="m"),
"velocity": SpeedUnitModel(speed="m/s"),
}
)
custom_template
AnisotropicVelocityTemplate
Data Domain: depth
Default Variable: velocity
Default Variable Units: m/s
Dimensions (3)
| Name | Size | Chunk Sizes | Units | Spatial |
|---|---|---|---|---|
| inline | — | 128 | — | ✓ |
| crossline | — | 128 | — | ✓ |
| depth | — | 128 | m |
Coordinates (2)
| Name | Type | Units |
|---|---|---|
| cdp_x | Physical | m |
| cdp_y | Physical | m |
Changing chunk size (chunks) on an existing template¶
Often you will want to tweak the chunking strategy for performance. You can do this in two ways:
When defining a subclass, set a default in the constructor (e.g.,
self._var_chunk_shape = (...)).On an existing template instance, assign to the
full_chunk_shapeproperty once you know your final dataset sizes (the tuple length must match the number of data dimensions).
Below is a tiny demo showing how to modify the chunk shape on a fetched template. We first build the
template with known sizes to satisfy validation, then update full_chunk_shape.
Note
In the SEG-Y to MDIO conversion workflow, MDIO infers the final grid shape from the SEG-Y headers. It’s
common to set or adjust full_chunk_shape right before calling segy_to_mdio, using the same sizes
you expect for the final array.
mdio_ds = custom_template.build_dataset(name="demo-only", sizes=(300, 500, 1001))
# pick smaller chunks than the full array for better parallelism and IO
custom_template.full_chunk_shape = (64, 64, 64)
print("Chunk shape set to:", custom_template.full_chunk_shape)
custom_template
Chunk shape set to: (64, 64, 64)
AnisotropicVelocityTemplate
Data Domain: depth
Default Variable: velocity
Default Variable Units: m/s
Dimensions (3)
| Name | Size | Chunk Sizes | Units | Spatial |
|---|---|---|---|---|
| inline | 300 | 64 | — | ✓ |
| crossline | 500 | 64 | — | ✓ |
| depth | 1,001 | 64 | m |
Coordinates (2)
| Name | Type | Units |
|---|---|---|
| cdp_x | Physical | m |
| cdp_y | Physical | m |
Making Dummy Xarray Dataset¶
We can now take the MDIO Dataset model and convert it to Xarray with our configuration. If ingesting from SEG-Y, this step gets executed automatically by the converter before populating the data.
Note that the whole dataset will be populated with the fill values.
from mdio.builder.xarray_builder import to_xarray_dataset
to_xarray_dataset(mdio_ds)
<xarray.Dataset> Size: 2GB
Dimensions: (inline: 300, crossline: 500, depth: 1001)
Coordinates:
* inline (inline) int32 1kB 2147483647 2147483647 ... 2147483647
* crossline (crossline) int32 2kB 2147483647 2147483647 ... 2147483647
* depth (depth) int32 4kB 2147483647 2147483647 ... 2147483647
cdp_x (inline, crossline) float64 1MB dask.array<chunksize=(300, 500), meta=np.ndarray>
cdp_y (inline, crossline) float64 1MB dask.array<chunksize=(300, 500), meta=np.ndarray>
Data variables:
velocity (inline, crossline, depth) float32 601MB dask.array<chunksize=(300, 256, 256), meta=np.ndarray>
epsilon (inline, crossline, depth) float32 601MB dask.array<chunksize=(300, 256, 256), meta=np.ndarray>
delta (inline, crossline, depth) float32 601MB dask.array<chunksize=(300, 256, 256), meta=np.ndarray>
trace_mask (inline, crossline) bool 150kB dask.array<chunksize=(300, 500), meta=np.ndarray>
Attributes:
apiVersion: 1.0.8
createdOn: 2025-10-20 15:51:50.947182+00:00
name: demo-only
attributes: {'surveyType': '3D', 'gatherType': 'line', 'defaultVariableN...Recap: Key APIs Used¶
Template registry helpers:
get_template_registry,list_templates,register_template,get_templateBase template to subclass:
AbstractDatasetTemplateMake Xarray Dataset from MDIO Data Model:
to_xarray_dataset
With these pieces, you can standardize how your seismic data is represented in MDIO and keep ingestion code concise and repeatable.