{ "cells": [ { "cell_type": "markdown", "id": "85114119ae7a4db0", "metadata": {}, "source": [ "# Create and Register a Custom Template\n", "\n", "```{article-info}\n", ":author: Altay Sansal\n", ":date: \"{sub-ref}`today`\"\n", ":read-time: \"{sub-ref}`wordcount-minutes` min read\"\n", ":class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light\n", "```\n", "\n", "```{warning}\n", "Most SEG-Y files correspond to standard seismic data types or field configurations. We recommend using\n", "the built-in templates from the registry whenever possible. Create a custom template only when your file\n", "is unusual and cannot be represented by existing templates. In many cases, you can simply customize the\n", "SEG-Y header byte mapping during ingestion without defining a new template.\n", "```\n", "\n", "In this tutorial we will walk through the Template Registry and show how to:\n", "\n", "- Discover available templates in the registry\n", "- Define and register your own template\n", "- Build a dataset model and convert it to an Xarray Dataset using your custom template\n", "\n", "If this is your first time with MDIO, you may want to skim the Quickstart first." ] }, { "cell_type": "markdown", "id": "a793f2cfb58f09cc", "metadata": {}, "source": [ "## What is a Template and a Template Registry?\n", "\n", "A template defines how an MDIO dataset is structured: names of dimensions and coordinates, the default variable name, chunking hints, and attributes to be stored. Since many seismic datasets share common structures (e.g., 3D post-stack, 2D post-stack, pre-stack CDP/shot, etc.), MDIO ships with a pre-populated template registry and APIs to fetch or register templates.\n", "\n", "Fetching a template from it returns a copied instance you can freely customize without affecting others." ] }, { "cell_type": "code", "execution_count": null, "id": "c7a760a019930d4e", "metadata": {}, "outputs": [], "source": [ "from mdio.builder.template_registry import get_template\n", "from mdio.builder.template_registry import get_template_registry\n", "from mdio.builder.template_registry import list_templates\n", "\n", "registry = get_template_registry()\n", "registry # pretty HTML in notebooks" ] }, { "cell_type": "markdown", "id": "810dbba2b6dba787", "metadata": {}, "source": [ "We can list all registered templates and get a list as well." ] }, { "cell_type": "code", "execution_count": null, "id": "38eb1da635c7be0f", "metadata": {}, "outputs": [], "source": [ "list_templates()" ] }, { "cell_type": "markdown", "id": "d87bd9ec781a8a8e", "metadata": {}, "source": [ "## Defining a Minimal Custom Template\n", "\n", "To define a custom template, subclass `AbstractDatasetTemplate` and set:\n", "\n", "- `_name`: a public name for the template\n", "- `_dim_names`: names for each axis of your data variable (the last axis is the trace/time or trace/depth axis)\n", "- `_physical_coord_names` and `_logical_coord_names`: optional additional coordinate variables to store along the spatial grid\n", "- `_load_dataset_attributes()`: optional attributes stored at the dataset level\n", "\n", "Below we create a special template that can hold interval velocity field with multiple anisotropy parameters for a depth seismic volume.\n", "\n", "The dimensions, dimension-coordinates and non-dimension coordinates will automatically get created using the method\n", "from the base class. However, since we want more variables, we override `_add_variables` to add them." ] }, { "cell_type": "code", "execution_count": null, "id": "cfc9d9b0e1b67a76", "metadata": {}, "outputs": [], "source": [ "from mdio.builder.schemas import compressors\n", "from mdio.builder.schemas.chunk_grid import RegularChunkGrid\n", "from mdio.builder.schemas.chunk_grid import RegularChunkShape\n", "from mdio.builder.schemas.dtype import ScalarType\n", "from mdio.builder.schemas.v1.variable import VariableMetadata\n", "from mdio.builder.templates.base import AbstractDatasetTemplate\n", "\n", "\n", "class AnisotropicVelocityTemplate(AbstractDatasetTemplate):\n", " \"\"\"A custom template that has unusual dimensions and coordinates.\"\"\"\n", "\n", " def __init__(self, data_domain: str = \"depth\") -> None:\n", " super().__init__(data_domain)\n", " # Dimension order matters; the last dimension is the depth\n", " self._dim_names = (\"inline\", \"crossline\", self.trace_domain)\n", " # Additional coordinates: these are added on top of dimension coordinates\n", " self._physical_coord_names = (\"cdp_x\", \"cdp_y\")\n", " self._var_chunk_shape = (128, 128, 128)\n", " self._units = {}\n", "\n", " @property\n", " def _name(self) -> str: # public name for the registry\n", " return \"AnisotropicVelocity3DDepth\"\n", "\n", " @property\n", " def _default_variable_name(self) -> str: # public name for the registry\n", " return \"velocity\"\n", "\n", " def _load_dataset_attributes(self) -> dict:\n", " return {\"surveyType\": \"3D\", \"gatherType\": \"line\"}\n", "\n", " def _add_variables(self) -> None:\n", " \"\"\"Add the variables including default and extra.\"\"\"\n", " for name in [\"velocity\", \"epsilon\", \"delta\"]:\n", " chunk_grid = RegularChunkGrid(configuration=RegularChunkShape(chunk_shape=self.full_chunk_shape))\n", " unit = self.get_unit_by_key(name)\n", " self._builder.add_variable(\n", " name=name,\n", " dimensions=self._dim_names,\n", " data_type=ScalarType.FLOAT32,\n", " compressor=compressors.Blosc(cname=compressors.BloscCname.zstd),\n", " coordinates=self.physical_coordinate_names,\n", " metadata=VariableMetadata(chunk_grid=chunk_grid, units_v1=unit),\n", " )\n", "\n", "\n", "AnisotropicVelocityTemplate()" ] }, { "cell_type": "markdown", "id": "15e61310ed0ffd97", "metadata": {}, "source": [ "## Registering the Custom Template\n", "\n", "The registry returns a deep copy of the template on every fetch. To make the template discoverable by name, register it first, then retrieve it with `get_template`." ] }, { "cell_type": "code", "execution_count": null, "id": "a4e1847b20da6768", "metadata": {}, "outputs": [], "source": [ "from mdio.builder.template_registry import register_template\n", "\n", "register_template(AnisotropicVelocityTemplate())\n", "print(\"Registered:\", \"AnisotropicVelocity3DDepth\" in list_templates())\n", "\n", "custom_template = get_template(\"AnisotropicVelocity3DDepth\")\n", "custom_template" ] }, { "cell_type": "markdown", "id": "83b0772f1913c652", "metadata": {}, "source": [ "You can also set units at any time. For this demo we’ll set metric units. The spatial units will be inferred from the SEG-Y binary header during ingestion, but we can override them here. Ingestion will honor what is in the template." ] }, { "cell_type": "code", "execution_count": null, "id": "d7dca50d72d2f93", "metadata": {}, "outputs": [], "source": [ "from mdio.builder.schemas.v1.units import LengthUnitModel\n", "from mdio.builder.schemas.v1.units import SpeedUnitModel\n", "\n", "custom_template.add_units(\n", " {\n", " \"depth\": LengthUnitModel(length=\"m\"),\n", " \"cdp_x\": LengthUnitModel(length=\"m\"),\n", " \"cdp_y\": LengthUnitModel(length=\"m\"),\n", " \"velocity\": SpeedUnitModel(speed=\"m/s\"),\n", " }\n", ")\n", "custom_template" ] }, { "cell_type": "markdown", "id": "367ade9824e72bc3", "metadata": {}, "source": [ "## Changing chunk size (chunks) on an existing template\n", "\n", "Often you will want to tweak the chunking strategy for performance. You can do this in two ways:\n", "\n", "- When defining a subclass, set a default in the constructor (e.g., `self._var_chunk_shape = (...)`).\n", "- On an existing template instance, assign to the `full_chunk_shape` property once you know your final\n", " dataset sizes (the tuple length must match the number of data dimensions).\n", "\n", "Below is a tiny demo showing how to modify the chunk shape on a fetched template. We first build the\n", "template with known sizes to satisfy validation, then update `full_chunk_shape`.\n", "\n", "```{note}\n", "In the SEG-Y to MDIO conversion workflow, MDIO infers the final grid shape from the SEG-Y headers. It’s\n", "common to set or adjust `full_chunk_shape` right before calling `segy_to_mdio`, using the same sizes\n", "you expect for the final array.\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "75939231b58c204a", "metadata": {}, "outputs": [], "source": [ "mdio_ds = custom_template.build_dataset(name=\"demo-only\", sizes=(300, 500, 1001))\n", "# pick smaller chunks than the full array for better parallelism and IO\n", "custom_template.full_chunk_shape = (64, 64, 64)\n", "print(\"Chunk shape set to:\", custom_template.full_chunk_shape)\n", "\n", "custom_template" ] }, { "cell_type": "markdown", "id": "a76f17cdf235de13", "metadata": {}, "source": [ "## Making Dummy Xarray Dataset\n", "\n", "We can now take the MDIO Dataset model and convert it to Xarray with our configuration. If ingesting from SEG-Y, this step\n", "gets executed automatically by the converter before populating the data.\n", "\n", "Note that the whole dataset will be populated with the fill values." ] }, { "cell_type": "code", "execution_count": null, "id": "ce3dcf9c7946ea07", "metadata": {}, "outputs": [], "source": [ "from mdio.builder.xarray_builder import to_xarray_dataset\n", "\n", "to_xarray_dataset(mdio_ds)" ] }, { "cell_type": "markdown", "id": "fc05aa3c81f8465c", "metadata": {}, "source": [ "## Recap: Key APIs Used\n", "\n", "- Template registry helpers: `get_template_registry`, `list_templates`, `register_template`, `get_template`\n", "- Base template to subclass: `AbstractDatasetTemplate`\n", "- Make Xarray Dataset from MDIO Data Model: `to_xarray_dataset`\n", "\n", "With these pieces, you can standardize how your seismic data is represented in MDIO and keep ingestion code concise and repeatable.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a15848ab5c0811d6", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "mystnb": { "execution_mode": "force" } }, "nbformat": 4, "nbformat_minor": 5 }