{ "cells": [ { "cell_type": "markdown", "id": "ec9ded46-b872-4a33-a98f-5afa2e9d1499", "metadata": {}, "source": [ "# Handling Corrupt SEG-Y Files\n", "\n", "```{article-info}\n", ":author: Altay Sansal\n", ":date: \"{sub-ref}`today`\"\n", ":read-time: \"{sub-ref}`wordcount-minutes` min read\"\n", ":class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light\n", "```\n", "\n", "In this tutorial, we will demonstrate how to handle some of the most common SEG-Y file issues that can\n", "occur during ingestion. To illustrate these problems and their solutions, we'll start by creating some\n", "intentionally malformed files using the [`TGSAI/segy`][tgsai-segy] library. Let's begin by importing the\n", "modules we'll be using throughout this tutorial.\n", "\n", "[tgsai-segy]: https://github.com/TGSAI/segy" ] }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "from pathlib import Path\n", "\n", "import numpy as np\n", "from segy import SegyFactory\n", "from segy.config import SegyHeaderOverrides\n", "from segy.schema import HeaderField\n", "from segy.standards import get_segy_standard\n", "\n", "from mdio import open_mdio\n", "from mdio import segy_to_mdio\n", "from mdio.builder.template_registry import get_template" ], "id": "9903030cdb08ddea" }, { "cell_type": "markdown", "id": "7735a63c74432eb9", "metadata": {}, "source": [ "## Fixing Coordinate Scalar Issues\n", "\n", "One of the most common issues in SEG-Y files is an invalid or missing coordinate scalar value. Let's start by\n", "creating a SEG-Y file with an intentionally incorrect coordinate scalar. We'll create a simple toy 2D stack dataset\n", "that contains CDP (Common Depth Point) numbers and dummy CDP-X/Y coordinates in the trace headers.\n", "\n", "To generate this example file, we will follow these steps:\n", "1. Create an empty SEG-Y factory with the appropriate specification.\n", "2. Populate the file headers (textual and binary headers).\n", "3. Generate 10 traces with headers and fill them with dummy sample values.\n", "\n", "[tgsai-segy]: https://github.com/TGSAI/segy" ] }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "n_traces = 10\n", "\n", "trace_header_fields = [\n", " HeaderField(name=\"cdp\", byte=21, format=\"int32\"),\n", " HeaderField(name=\"cdp_x\", byte=181, format=\"int32\"),\n", " HeaderField(name=\"cdp_y\", byte=185, format=\"int32\"),\n", "]\n", "spec = get_segy_standard(1.0).customize(trace_header_fields=trace_header_fields)\n", "factory = SegyFactory(spec=spec, sample_interval=4000, samples_per_trace=1201)\n", "\n", "txt_header = factory.create_textual_header() # default text header\n", "bin_header = factory.create_binary_header() # default binary header\n", "\n", "headers = factory.create_trace_header_template(n_traces) # default all zero except n_samp and interval\n", "samples = factory.create_trace_sample_template(n_traces) # default all zero\n", "\n", "rng = np.random.default_rng(seed=42)\n", "headers[\"cdp\"] = np.arange(n_traces) # cdp\n", "headers[\"coordinate_scalar\"] = 0\n", "headers[\"cdp_x\"] = np.arange(n_traces) * 1000\n", "headers[\"cdp_y\"] = np.arange(n_traces) * 10000\n", "samples[:] = rng.normal(size=samples.shape).astype(\"float16\")\n", "\n", "# encode traces to SEG-Y buffer and write\n", "with Path(\"tmp.sgy\").open(mode=\"wb\") as fp:\n", " fp.write(txt_header)\n", " fp.write(bin_header)\n", " fp.write(factory.create_traces(headers, samples))\n", "\n", "print(\"Wrote temporary SEG-Y file successfully.\")" ], "id": "14577fee3a776eba" }, { "cell_type": "markdown", "id": "efdf0c533c6b5589", "metadata": {}, "source": [ "As mentioned earlier, this file has a zero value in the coordinate scalar field. According to the SEG-Y standard\n", "(both Revision 0 and Revision 1), a coordinate scalar of zero is invalid and should not be used.\n", "\n", "Starting with MDIO v1, we extract X/Y coordinates (such as CDP-X/Y, Shot-X/Y, etc.) as dedicated MDIO variables\n", "for easier access and manipulation. For these coordinates to be extracted correctly, the coordinate scalar must be\n", "valid. If we attempt to ingest the file with an invalid coordinate scalar, MDIO will raise an error. Let's try to\n", "ingest the file and catch the resulting error to demonstrate this issue." ] }, { "cell_type": "code", "execution_count": null, "id": "5537acb5a0ef370d", "metadata": {}, "outputs": [], "source": [ "mdio_template = get_template(\"PostStack2DTime\")\n", "\n", "ingestion_kwargs = {\n", " \"segy_spec\": spec,\n", " \"mdio_template\": mdio_template,\n", " \"input_path\": \"tmp.sgy\",\n", " \"output_path\": \"tmp.mdio\",\n", " \"overwrite\": True,\n", "}\n", "try:\n", " segy_to_mdio(**ingestion_kwargs)\n", " print(\"Ingestion successful.\")\n", "except ValueError as e:\n", " print(f\"Ingestion failed with error: {e}\")" ] }, { "cell_type": "markdown", "id": "de52bcf4-9eb9-4f19-8aca-2ade664b1649", "metadata": {}, "source": [ "### Fixing the Coordinate Scalar\n", "\n", "To be able to read this file without issues, we can utilize the `SegyHeaderOverrides` option to override the\n", "existing value at runtime and also have the correct value in the final MDIO file. With the value `-100` we\n", "expect the coordinates to be divided by 100." ] }, { "cell_type": "code", "execution_count": null, "id": "72ba0a22-6e14-400b-8227-3ec6e93fbc52", "metadata": {}, "outputs": [], "source": [ "overrides = SegyHeaderOverrides(trace_header={\"coordinate_scalar\": -100})\n", "\n", "segy_to_mdio(**ingestion_kwargs, segy_header_overrides=overrides)\n", "print(\"Ingestion successful.\")" ] }, { "cell_type": "markdown", "id": "ebccf97c-390e-4e12-8192-2db5a1a3612d", "metadata": {}, "source": [ "Now that the ingestion has completed successfully, we can open the MDIO file and inspect its contents to verify\n", "that everything was processed correctly." ] }, { "cell_type": "code", "execution_count": null, "id": "7894292d-ac08-4c19-bdaa-3518bd112c78", "metadata": {}, "outputs": [], "source": [ "ds = open_mdio(\"tmp.mdio\")\n", "ds" ] }, { "cell_type": "markdown", "id": "db670951-a8f4-4e11-9421-b1e8b5384185", "metadata": {}, "source": [ "### Verifying the Coordinate Scaling\n", "\n", "Let's verify that the CDP-X/Y coordinates have been correctly scaled according to the coordinate scalar value\n", "we set. Since we used a coordinate scalar of `-100`, the coordinate values should be divided by 100. As expected,\n", "the coordinates are properly scaled." ] }, { "cell_type": "code", "execution_count": null, "id": "4ac3f9d2-cb57-4503-8ae0-813f2d27f984", "metadata": {}, "outputs": [], "source": [ "ds[[\"cdp_x\", \"cdp_y\"]].compute()" ] }, { "cell_type": "markdown", "id": "9c8c8adc-e1d6-41b6-bc35-e7cae614fb2a", "metadata": {}, "source": [ "We can also verify that the coordinate scalar was properly handled during ingestion by examining the first trace\n", "header. This confirms that MDIO has correctly processed and stored the coordinate scalar information." ] }, { "cell_type": "code", "execution_count": null, "id": "56079e59-0793-4a75-b3f2-dff626014a96", "metadata": {}, "outputs": [], "source": [ "ds.headers[0].values[\"coordinate_scalar\"]" ] }, { "cell_type": "markdown", "id": "d712d081-0a53-446b-a5b4-d38b87084522", "metadata": {}, "source": [ "## Fixing X/Y Units Issues\n", "\n", "You may have noticed that the CDP-X/Y coordinate units were not properly ingested, and a warning was displayed\n", "during the ingestion process. This occurs because the `measurement_system_code` field in the binary header is set\n", "to `0`, which is invalid according to the SEG-Y standard. Valid values are `1` for meters and `2` for feet.\n", "\n", "Fortunately, we can also override the binary header values during ingestion to ensure the units are correctly\n", "interpreted and stored in the MDIO file. Let's fix both the coordinate scalar and the measurement system code\n", "simultaneously." ] }, { "cell_type": "code", "execution_count": null, "id": "fca495c0-83b6-4493-bdae-0aae13eb4fe0", "metadata": {}, "outputs": [], "source": [ "overrides = SegyHeaderOverrides(\n", " binary_header={\"measurement_system_code\": 1},\n", " trace_header={\"coordinate_scalar\": -100},\n", ")\n", "\n", "segy_to_mdio(**ingestion_kwargs, segy_header_overrides=overrides)\n", "print(\"Ingestion successful.\")" ] }, { "cell_type": "markdown", "id": "648145a2-0987-4b16-8270-86e28e34b486", "metadata": {}, "source": [ "### Verifying the Units\n", "\n", "Now let's verify that both the coordinate scaling and the measurement units have been correctly applied. We can\n", "inspect the units stored in the MDIO file's variable attributes. Since we set the `measurement_system_code` to `1`,\n", "the coordinates should now have their units properly identified as meters." ] }, { "cell_type": "code", "execution_count": null, "id": "f67e1845-dc0d-411b-ace3-264d55b91a24", "metadata": {}, "outputs": [], "source": [ "ds = open_mdio(\"tmp.mdio\")\n", "print(f\"CDP-X/Y Units: {ds['cdp_x'].attrs['unitsV1']} / {ds['cdp_y'].attrs['unitsV1']}\")" ] }, { "cell_type": "markdown", "id": "37418ccc021e130f", "metadata": {}, "source": [ "Perfect! The coordinate units are now correctly identified as meters. By using the `SegyHeaderOverrides` configuration,\n", "we successfully corrected both the invalid coordinate scalar and the missing measurement system code, ensuring that\n", "the MDIO file contains accurate coordinate information with proper units.\n", "\n", "## Summary\n", "\n", "In this tutorial, we demonstrated how to handle common SEG-Y file issues using MDIO's header override functionality:\n", "\n", "1. **Invalid Coordinate Scalar**: We showed how to override incorrect or zero coordinate scalar values to ensure\n", " proper coordinate extraction and scaling.\n", "2. **Missing Measurement Units**: We demonstrated how to set the measurement system code to ensure coordinate units\n", " are correctly identified in the output MDIO file.\n", "\n", "The `SegyHeaderOverrides` feature provides a flexible way to work with imperfect SEG-Y files without needing to\n", "modify the original files, making it easier to ingest real-world datasets that may not strictly follow the SEG-Y\n", "standard." ] }, { "cell_type": "code", "execution_count": null, "id": "b679f205b4f2e3d6", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "mystnb": { "execution_mode": "force" } }, "nbformat": 4, "nbformat_minor": 5 }