# Handling Corrupt SEG-Y Files

```{article-info}
:author: Altay Sansal
:date: "{sub-ref}`today`"
:read-time: "{sub-ref}`wordcount-minutes` min read"
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
```

In this tutorial, we will demonstrate how to handle some of the most common SEG-Y file issues that can
occur during ingestion. To illustrate these problems and their solutions, we'll start by creating some
intentionally malformed files using the [`TGSAI/segy`][tgsai-segy] library. Let's begin by importing the
modules we'll be using throughout this tutorial.

[tgsai-segy]: https://github.com/TGSAI/segy

In [None]:
from pathlib import Path

import numpy as np
from segy import SegyFactory
from segy.config import SegyHeaderOverrides
from segy.schema import HeaderField
from segy.standards import get_segy_standard

from mdio import open_mdio
from mdio import segy_to_mdio
from mdio.builder.template_registry import get_template

## Fixing Coordinate Scalar Issues

One of the most common issues in SEG-Y files is an invalid or missing coordinate scalar value. Let's start by
creating a SEG-Y file with an intentionally incorrect coordinate scalar. We'll create a simple toy 2D stack dataset
that contains CDP (Common Depth Point) numbers and dummy CDP-X/Y coordinates in the trace headers.

To generate this example file, we will follow these steps:
1. Create an empty SEG-Y factory with the appropriate specification.
2. Populate the file headers (textual and binary headers).
3. Generate 10 traces with headers and fill them with dummy sample values.

[tgsai-segy]: https://github.com/TGSAI/segy

In [None]:
n_traces = 10

trace_header_fields = [
 HeaderField(name="cdp", byte=21, format="int32"),
 HeaderField(name="cdp_x", byte=181, format="int32"),
 HeaderField(name="cdp_y", byte=185, format="int32"),
]
spec = get_segy_standard(1.0).customize(trace_header_fields=trace_header_fields)
factory = SegyFactory(spec=spec, sample_interval=4000, samples_per_trace=1201)

txt_header = factory.create_textual_header() # default text header
bin_header = factory.create_binary_header() # default binary header

headers = factory.create_trace_header_template(n_traces) # default all zero except n_samp and interval
samples = factory.create_trace_sample_template(n_traces) # default all zero

rng = np.random.default_rng(seed=42)
headers["cdp"] = np.arange(n_traces) # cdp
headers["coordinate_scalar"] = 0
headers["cdp_x"] = np.arange(n_traces) * 1000
headers["cdp_y"] = np.arange(n_traces) * 10000
samples[:] = rng.normal(size=samples.shape).astype("float16")

# encode traces to SEG-Y buffer and write
with Path("tmp.sgy").open(mode="wb") as fp:
 fp.write(txt_header)
 fp.write(bin_header)
 fp.write(factory.create_traces(headers, samples))

print("Wrote temporary SEG-Y file successfully.")

As mentioned earlier, this file has a zero value in the coordinate scalar field. According to the SEG-Y standard
(both Revision 0 and Revision 1), a coordinate scalar of zero is invalid and should not be used.

Starting with MDIO v1, we extract X/Y coordinates (such as CDP-X/Y, Shot-X/Y, etc.) as dedicated MDIO variables
for easier access and manipulation. For these coordinates to be extracted correctly, the coordinate scalar must be
valid. If we attempt to ingest the file with an invalid coordinate scalar, MDIO will raise an error. Let's try to
ingest the file and catch the resulting error to demonstrate this issue.

In [None]:
mdio_template = get_template("PostStack2DTime")

ingestion_kwargs = {
 "segy_spec": spec,
 "mdio_template": mdio_template,
 "input_path": "tmp.sgy",
 "output_path": "tmp.mdio",
 "overwrite": True,
}
try:
 segy_to_mdio(**ingestion_kwargs)
 print("Ingestion successful.")
except ValueError as e:
 print(f"Ingestion failed with error: {e}")

### Fixing the Coordinate Scalar

To be able to read this file without issues, we can utilize the `SegyHeaderOverrides` option to override the
existing value at runtime and also have the correct value in the final MDIO file. With the value `-100` we
expect the coordinates to be divided by 100.

In [None]:
overrides = SegyHeaderOverrides(trace_header={"coordinate_scalar": -100})

segy_to_mdio(**ingestion_kwargs, segy_header_overrides=overrides)
print("Ingestion successful.")

Now that the ingestion has completed successfully, we can open the MDIO file and inspect its contents to verify
that everything was processed correctly.

In [None]:
ds = open_mdio("tmp.mdio")
ds

### Verifying the Coordinate Scaling

Let's verify that the CDP-X/Y coordinates have been correctly scaled according to the coordinate scalar value
we set. Since we used a coordinate scalar of `-100`, the coordinate values should be divided by 100. As expected,
the coordinates are properly scaled.

In [None]:
ds[["cdp_x", "cdp_y"]].compute()

We can also verify that the coordinate scalar was properly handled during ingestion by examining the first trace
header. This confirms that MDIO has correctly processed and stored the coordinate scalar information.

In [None]:
ds.headers[0].values["coordinate_scalar"]

## Fixing X/Y Units Issues

You may have noticed that the CDP-X/Y coordinate units were not properly ingested, and a warning was displayed
during the ingestion process. This occurs because the `measurement_system_code` field in the binary header is set
to `0`, which is invalid according to the SEG-Y standard. Valid values are `1` for meters and `2` for feet.

Fortunately, we can also override the binary header values during ingestion to ensure the units are correctly
interpreted and stored in the MDIO file. Let's fix both the coordinate scalar and the measurement system code
simultaneously.

In [None]:
overrides = SegyHeaderOverrides(
 binary_header={"measurement_system_code": 1},
 trace_header={"coordinate_scalar": -100},
)

segy_to_mdio(**ingestion_kwargs, segy_header_overrides=overrides)
print("Ingestion successful.")

### Verifying the Units

Now let's verify that both the coordinate scaling and the measurement units have been correctly applied. We can
inspect the units stored in the MDIO file's variable attributes. Since we set the `measurement_system_code` to `1`,
the coordinates should now have their units properly identified as meters.

In [None]:
ds = open_mdio("tmp.mdio")
print(f"CDP-X/Y Units: {ds['cdp_x'].attrs['unitsV1']} / {ds['cdp_y'].attrs['unitsV1']}")

Perfect! The coordinate units are now correctly identified as meters. By using the `SegyHeaderOverrides` configuration,
we successfully corrected both the invalid coordinate scalar and the missing measurement system code, ensuring that
the MDIO file contains accurate coordinate information with proper units.

## Summary

In this tutorial, we demonstrated how to handle common SEG-Y file issues using MDIO's header override functionality:

1. **Invalid Coordinate Scalar**: We showed how to override incorrect or zero coordinate scalar values to ensure
 proper coordinate extraction and scaling.
2. **Missing Measurement Units**: We demonstrated how to set the measurement system code to ensure coordinate units
 are correctly identified in the output MDIO file.

The `SegyHeaderOverrides` feature provides a flexible way to work with imperfect SEG-Y files without needing to
modify the original files, making it easier to ingest real-world datasets that may not strictly follow the SEG-Y
standard.