Get Started in 10 Minutes¶

Altay Sansal

Oct 20, 2025

9 min read

In this page we will be showing basic capabilities of MDIO.

For demonstration purposes, we will ingest the remote Teapot Dome open-source dataset. The dataset details and licensing can be found at the SEG Wiki.

We are using the 3D seismic stack dataset named filt_mig.sgy.

The full HTTP link for the dataset (hosted on AWS): http://s3.amazonaws.com/teapot/filt_mig.sgy

Warning

For plotting and remote ingestion the notebook requires Matplotlib and aiohttp as a dependency. Please install it before executing using pip install matplotlib aiohttp or conda install matplotlib aiohttp.

Defining the SEG-Y Dataset¶

Since MDIO 0.8 we can directly ingest remote SEG-Y files! The file is 386 MB in size. To make the header scan performant we can also set up an environment variable for MDIO. See here for when to use this: Buffered Reads in Ingestion.

The dataset is irregularly shaped, however it is padded to a rectangle with zeros (dead traces). We will see that later at the live mask plotting.

The following environment variables are essential here:

MDIO__IMPORT__CLOUD_NATIVE tells MDIO to do buffered reads for headers due to remote file.
MDIO__IMPORT__SAVE_SEGY_FILE_HEADER makes MDIO save the SEG-Y specific file headers (text, binary) which is not strictly necessary for consumption and is disabled by default.

import os

os.environ["MDIO__IMPORT__CLOUD_NATIVE"] = "true"
os.environ["MDIO__IMPORT__SAVE_SEGY_FILE_HEADER"] = "true"

input_url = "http://s3.amazonaws.com/teapot/filt_mig.sgy"

Ingesting to MDIO¶

To do this, we can use the convenient SEG-Y to MDIO converter.

The inline and crossline values are located at bytes 181 and 185. Note that this doesn’t match any SEG-Y standards.

MDIO uses TGSAI/segy to parse the SEG-Y; the field names conform to its canonical keys defined in SEGY Binary Header Keys and SEGY Trace Header Keys. Since MDIO v1 we also introduced templates for common seismic data types. For instance, we will be using the PostStack3DTime template here, which expects the same canonical keys.

We will also specify the units for the time domain. The spatial units will be automatically parsed from SEG-Y binary header. However, there may be a case where it is corrupt in the file, for that see the Fixing X/Y Units Issues section.

In summary, we will use the byte locations as defined for ingestion.

import matplotlib.pyplot as plt
from segy.schema import HeaderField
from segy.standards import get_segy_standard

from mdio import segy_to_mdio
from mdio.builder.schemas.v1.units import TimeUnitModel
from mdio.builder.template_registry import get_template

teapot_trace_headers = [
    HeaderField(name="inline", byte=181, format="int32"),
    HeaderField(name="crossline", byte=185, format="int32"),
    HeaderField(name="cdp_x", byte=189, format="int32"),
    HeaderField(name="cdp_y", byte=193, format="int32"),
]

rev0_segy_spec = get_segy_standard(0)
teapot_segy_spec = rev0_segy_spec.customize(trace_header_fields=teapot_trace_headers)

mdio_template = get_template("PostStack3DTime")
unit_ms = TimeUnitModel(time="ms")
mdio_template.add_units({"time": unit_ms})

segy_to_mdio(
    input_path=input_url,
    output_path="filt_mig.mdio",
    segy_spec=teapot_segy_spec,
    mdio_template=mdio_template,
    overwrite=True,
)

It only took a few seconds to ingest, since this is a very small file.

However, MDIO scales up to TB (that’s ~1,000 GB) sized volumes!

Querying Metadata¶

Now let’s look at the file text header saved in the segy_file_header metadata variable.

print(dataset["segy_file_header"].attrs["textHeader"])

C 1 CLIENT: ROCKY MOUNTAIN OILFIELD TESTING CENTER                              
C 2 PROJECT: NAVAL PETROLEUM RESERVE #3 (TEAPOT DOME); NATRONA COUNTY, WYOMING  
C 3 LINE: 3D                                                                    
C 4                                                                             
C 5 THIS IS THE FILTERED POST STACK MIGRATION                                   
C 6                                                                             
C 7 INLINE 1, XLINE 1:   X COORDINATE: 788937  Y COORDINATE: 938845             
C 8 INLINE 1, XLINE 188: X COORDINATE: 809501  Y COORDINATE: 939333             
C 9 INLINE 188, XLINE 1: X COORDINATE: 788039  Y COORDINATE: 976674             
C10 INLINE NUMBER:    MIN: 1  MAX: 345  TOTAL: 345                              
C11 CROSSLINE NUMBER: MIN: 1  MAX: 188  TOTAL: 188                              
C12 TOTAL NUMBER OF CDPS: 64860   BIN DIMENSION: 110' X 110'                    
C13                                                                             
C14                                                                             
C15                                                                             
C16                                                                             
C17                                                                             
C18                                                                             
C19 GENERAL SEGY INFORMATION                                                    
C20 RECORD LENGHT (MS): 3000                                                    
C21 SAMPLE RATE (MS): 2.0                                                       
C22 DATA FORMAT: 4 BYTE IBM FLOATING POINT                                      
C23 BYTES  13- 16: CROSSLINE NUMBER (TRACE)                                     
C24 BYTES  17- 20: INLINE NUMBER (LINE)                                         
C25 BYTES  81- 84: CDP_X COORD                                                  
C26 BYTES  85- 88: CDP_Y COORD                                                  
C27 BYTES 181-184: INLINE NUMBER (LINE)                                         
C28 BYTES 185-188: CROSSLINE NUMBER (TRACE)                                     
C29 BYTES 189-192: CDP_X COORD                                                  
C30 BYTES 193-196: CDP_Y COORD                                                  
C31                                                                             
C32                                                                             
C33                                                                             
C34                                                                             
C35                                                                             
C36 Processed by: Excel Geophysical Services, Inc.                              
C37               8301 East Prentice Ave. Ste. 402                              
C38               Englewood, Colorado 80111                                     
C39               (voice) 303.694.9629 (fax) 303.771.1646                       
C40 END EBCDIC                                                                  

Since we saved the binary header, we can look at that as well.

dataset["segy_file_header"].attrs["binaryHeader"]

{'job_id': 9999,
 'line_num': 9999,
 'reel_num': 1,
 'data_traces_per_ensemble': 188,
 'aux_traces_per_ensemble': 0,
 'sample_interval': 2000,
 'orig_sample_interval': 0,
 'samples_per_trace': 1501,
 'orig_samples_per_trace': 1501,
 'data_sample_format': 1,
 'ensemble_fold': 57,
 'trace_sorting_code': 4,
 'vertical_sum_code': 1,
 'sweep_freq_start': 0,
 'sweep_freq_end': 0,
 'sweep_length': 0,
 'sweep_type_code': 0,
 'sweep_trace_num': 0,
 'sweep_taper_start': 0,
 'sweep_taper_end': 0,
 'taper_type_code': 0,
 'correlated_data_code': 2,
 'binary_gain_code': 1,
 'amp_recovery_code': 4,
 'measurement_system_code': 2,
 'impulse_polarity_code': 1,
 'vibratory_polarity_code': 0,
 'segy_revision_major': 0,
 'segy_revision_minor': 0}

MDIO Grid, Dimensions, and Related Attributes¶

MDIO has an abstraction for an N-dimensional grid. MDIO also has named dimensions, so we can see which dimension (axis) corresponds to which name.

dataset.sizes

Frozen({'inline': 345, 'crossline': 188, 'time': 1501})

dataset.inline

<xarray.DataArray 'inline' (inline: 345)> Size: 1kB
array([  1,   2,   3, ..., 343, 344, 345], shape=(345,), dtype=int32)
Coordinates:
  * inline   (inline) int32 1kB 1 2 3 4 5 6 7 8 ... 339 340 341 342 343 344 345

dataset.crossline

<xarray.DataArray 'crossline' (crossline: 188)> Size: 752B
array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,
        15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,
        29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,
        43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,
        57,  58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,
        71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,
        85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,  98,
        99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
       113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,
       127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140,
       141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,
       155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182,
       183, 184, 185, 186, 187, 188], dtype=int32)
Coordinates:
  * crossline  (crossline) int32 752B 1 2 3 4 5 6 7 ... 183 184 185 186 187 188

xarray.DataArray

'crossline'

crossline: 188

1 2 3 4 5 6 7 8 9 10 11 ... 179 180 181 182 183 184 185 186 187 188

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,
        15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,
        29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,
        43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,
        57,  58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,
        71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,
        85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,  98,
        99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
       113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,
       127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140,
       141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,
       155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182,
       183, 184, 185, 186, 187, 188], dtype=int32)

Coordinates: (1)

crossline

(crossline)

int32

1 2 3 4 5 6 ... 184 185 186 187 188

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,
        15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,
        29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,
        43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,
        57,  58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,
        71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,
        85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,  98,
        99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
       113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,
       127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140,
       141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,
       155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182,
       183, 184, 185, 186, 187, 188], dtype=int32)

Fetching Data and Plotting¶

Now we will demonstrate getting an inline from MDIO.

Since MDIO v1 we are using Xarray under the hood, so we can use its convenient indexing. It also handles the plotting with proper dimension coordinate labels.

MDIO stores summary statistics. We can calculate the standard deviation (std) value of the dataset to adjust the gain.

from mdio.builder.schemas.v1.stats import SummaryStatistics

stats = SummaryStatistics.model_validate_json(dataset["amplitude"].attrs["statsV1"])
std = ((stats.sum_squares / stats.count) - (stats.sum / stats.count) ** 2) ** 0.5

il_dataset = dataset.sel(inline=278)
il_amp = il_dataset["amplitude"].T
il_amp.plot(vmin=-2 * std, vmax=2 * std, cmap="gray_r", yincrease=False);

../_images/de066e07fb2ff94a0593629290e42fde4de91322beaa1812099a822a03f9af7f.png

Let’s do the same with a time slice.

We will display two-way-time at 1,000 ms.

Note that since we parse the X/Y coordinates, we can plot time slice in real world coordinates.

twt_data = dataset["amplitude"].sel(time=1000)
twt_data.plot(vmin=-2 * std, vmax=2 * std, cmap="gray_r", x="cdp_x", y="cdp_y");

../_images/39cae028c7029c6f5336e0c89836b5a70b319960eb7dc530e44442ca743af867.png

We can also overlay live mask with the time slice. However, in this example, the dataset is zero-padded.

The live trace mask will always show True (yellow).

trace_mask = dataset.trace_mask[:]

twt_data.plot(vmin=-2 * std, vmax=2 * std, cmap="gray_r", x="cdp_x", y="cdp_y", alpha=0.5, figsize=(8, 5))
trace_mask.plot(vmin=0, vmax=1, x="cdp_x", y="cdp_y", alpha=0.5);

../_images/fa60859b29e6801ebf1b7b7aba62bc0c0f56e956225e7493dfdc09c53e2584e1.png

MDIO to SEG-Y Conversion¶

Finally, let’s demonstrate going back to SEG-Y.

We will use the convenient mdio_to_segy function and write it out as a round-trip file.

The output spec can be modified if we want to write things to different byte locations, etc, but we will use the same one as before.

from mdio import mdio_to_segy

mdio_to_segy(
    input_path="filt_mig.mdio",
    output_path="filt_mig_roundtrip.sgy",
    segy_spec=teapot_segy_spec,
)

Validate Round-Trip SEG-Y File¶

We can validate if the round-trip SEG-Y file is matching the original using TGSAI/segy.

Step by step:

Open original file
Open round-trip file
Compare text headers
Compare binary headers
Compare 100 random headers and traces

import numpy as np
from segy import SegyFile

original_segy = SegyFile(input_url)
roundtrip_segy = SegyFile("filt_mig_roundtrip.sgy")

# Compare text header
assert original_segy.text_header == roundtrip_segy.text_header

# Compare bin header
assert original_segy.binary_header == roundtrip_segy.binary_header

# Compare 100 random trace headers and traces
rng = np.random.default_rng()
rand_indices = rng.integers(low=0, high=original_segy.num_traces, size=100)
for idx in rand_indices:
    np.testing.assert_equal(original_segy.trace[idx], roundtrip_segy.trace[idx])

print("Files identical!")

Files identical!

Get Started in 10 Minutes¶

Defining the SEG-Y Dataset¶

Ingesting to MDIO¶

Opening the Ingested MDIO File¶

Querying Metadata¶

Fetching Data and Plotting¶

Query Headers¶

MDIO to SEG-Y Conversion¶

Validate Round-Trip SEG-Y File¶