## Working with CDF Files

CDF (Common Data Format) is a self-describing binary format developed by NASA, widely used in space physics and heliophysics. A CDF file contains one or more **variables**, each accompanied by metadata **attributes** (units, description, fill value, etc.). Variables are independent arrays; the relationship between them (e.g., which variable is the time axis for another) is expressed through attributes rather than file structure. This playbook shows how to inspect an unfamiliar CDF file, identify the time variable, and plot numeric data against time.

---

### Requirements

```
pip install cdflib astropy matplotlib numpy requests
```

---

### Step 1: Inspect the file

Open the CDF and list its contents without loading data into memory.

```python
import cdflib

path = "/path/to/file.cdf"   # or a local path downloaded from a URL

cdf  = cdflib.CDF(path)
info = cdf.cdf_info()

print("Variables :", info.zVariables)
print("Global attributes:", list(info.Attributes))   # or info.Attributes depending on cdflib version
```

Examine individual variables:

```python
for var in info.zVariables:
    vi = cdf.varinq(var)
    va = cdf.varattsget(var)
    dtype = vi.Data_Type_Description
    shape = vi.Dim_Sizes
    units = va.get("UNITS", va.get("units", "—"))
    desc  = va.get("CATDESC", va.get("FIELDNAM", ""))
    print(f"  {var:30s} {dtype:30s} units={units!r:15s} {desc}")
```

**Identifying the time variable:** The time axis is usually a variable with data type `CDF_EPOCH`, `CDF_EPOCH16`, or `CDF_TT2000`. It is often named `Epoch` by convention. Check `vi.Data_Type_Description` for these strings, or look for the `VAR_TYPE = 'support_data'` attribute in combination with an epoch data type.

---

### Step 2: Load data and convert time

`cdflib.cdfepoch.to_datetime` converts CDF epoch values to Python `datetime` objects regardless of whether the epoch type is `CDF_EPOCH`, `CDF_EPOCH16`, or `CDF_TT2000`.

```python
import numpy as np
import cdflib

cdf        = cdflib.CDF(path)
epoch_raw  = cdf.varget("Epoch")          # adjust variable name if needed
times      = np.array(cdflib.cdfepoch.to_datetime(epoch_raw))

# Load one or more numeric variables
rate = cdf.varget("RATE")                 # adjust to the variable you need
```

**Handling fill values:** Many CDF variables define a fill value (`FILLVAL` attribute) that marks missing data. Replace fill values with `NaN` before plotting:

```python
va   = cdf.varattsget("RATE")
fill = va.get("FILLVAL")
if fill is not None:
    rate = rate.astype(float)
    rate[rate == fill] = np.nan
```

---

### Step 3: Plot

```python
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(times, rate, lw=0.8)
ax.set_xlabel("Time (UTC)")
ax.set_ylabel(va.get("UNITS", ""))
ax.set_title(va.get("CATDESC", "RATE"))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
fig.autofmt_xdate()
plt.tight_layout()
plt.close()
```

---

### Working with remote files

Use `astropy.utils.data.download_file` to fetch a CDF from a URL and cache it locally so repeated runs avoid re-downloading:

```python
from astropy.utils.data import download_file
import cdflib

url   = "https://..."
local = download_file(url, cache=True)
cdf   = cdflib.CDF(local)
```

---

### Generic helper

The function below wraps the steps above into a reusable loader:

```python
import numpy as np
import cdflib
from astropy.utils.data import download_file


def load_cdf(url, time_var="Epoch", value_vars=None):
    """
    Load a CDF file from a URL (or local path) and return a dict of arrays.
    time_var  : name of the CDF_EPOCH/TT2000 variable to use as the time axis.
    value_vars: list of variable names to load; if None, loads all zVariables.
    """
    local = download_file(url, cache=True) if url.startswith("http") else url
    cdf   = cdflib.CDF(local)
    info  = cdf.cdf_info()

    if value_vars is None:
        value_vars = [v for v in info.zVariables if v != time_var]

    epoch_raw = cdf.varget(time_var)
    times     = np.array(cdflib.cdfepoch.to_datetime(epoch_raw))

    result = {"time": times}
    for var in value_vars:
        data = cdf.varget(var).astype(float)
        va   = cdf.varattsget(var)
        fill = va.get("FILLVAL")
        if fill is not None:
            data[data == fill] = np.nan
        result[var] = data

    return result
```

Usage:

```python
d = load_cdf("https://...", time_var="Epoch", value_vars=["FLUX", "ENERGY"])

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(d["time"], d["FLUX"])
plt.close()
```

---

### Note on ISTP/SPDF conventions

Many heliophysics CDF files follow the [ISTP metadata guidelines](https://spdf.gsfc.nasa.gov/istp_guide/istp_guide.html). Under these conventions:

- Each data variable has a `DEPEND_0` attribute naming its time variable.
- `UNITS` holds the physical unit string.
- `FILLVAL` marks bad/missing samples.
- `VALIDMIN` / `VALIDMAX` give the expected physical range.
- `VAR_TYPE` distinguishes data (`'data'`), time axes (`'support_data'`), and metadata (`'metadata'`).

You can use these attributes to automate axis labelling and fill-value masking without hardcoding variable names.

---

### References

- [cdflib documentation](https://cdflib.readthedocs.io/)
- [CDF Format Description (NASA/GSFC)](https://cdf.gsfc.nasa.gov/html/CDF_docs.html)
- [ISTP/SPDF Metadata Guidelines](https://spdf.gsfc.nasa.gov/istp_guide/istp_guide.html)
- [astropy.utils.data.download_file](https://docs.astropy.org/en/stable/utils/data.html)
