Sharing a small experiment that might interest folks here. I computed the full Oceanic Niño Index (ONI) — every overlapping 3-month season from 1950 to today — as a single SQL query running in place against ARCO-ERA5 on GCS, using zarr-datafusion (GitHub - jayendra13/zarr-datafusion: Extending DataFusion to do SQL queries on Zarr data. · GitHub) (a DataFusion-based SQL engine for Zarr).
The query does the whole pipeline declaratively: sample the Niño-3.4 box, build monthly SST anomalies against NOAA’s centred rolling 30-year climatology, take the 3-month running mean, and classify the ENSO phase — streaming only the chunks it touches, with no local copy of the dataset.
To check it’s actually right, I compared all 916 seasons against NOAA CPC’s official (ERSSTv5-based) table: MAE 0.17 °C (0.12 °C post-1979), Pearson r = 0.966, and zero sign reversals in phase classification. The residuals behave as you’d expect — larger in the pre-satellite era, shrinking toward the present.
One caveat worth flagging: the monthly value isn’t a true monthly mean. To keep the remote reads small, the query samples a single timestep per month — 12:00 UTC on the 15th — and treats that one hourly field as representative of the whole month. It’s a deliberate cost/accuracy trade-off and a real source of scatter in the residuals (alongside the ERA5-vs-ERSSTv5 dataset difference).
Where I’d love feedback:
- The sampling shortcut — is one timestep/month defensible for ONI, or does it bias specific seasons? If you’ve quantified single-sample vs. full-monthly-mean error on ERA5 SST, I’d love to hear it.
- Climatology handling — I reproduced NOAA CPC’s centred, rolling 30-year base-period schedule; curious whether others here approximate it differently.
- The ERA5↔ERSSTv5 comparison — is ~0.1–0.2 °C the right expectation for that dataset gap, or am I attributing too much of the residual to it?
Full write-up and the runnable SQL/validation/plots below — happy to be told I’ve got something wrong.
References
- Blog post (narrative + validation): Computing the El Niño index from ERA5 with SQL · Stratoscale
- Cookbook (SQL, comparison script, plots): zarr-datafusion/cookbook/el-nino-oni at main · jayendra13/zarr-datafusion · GitHub