Blog post: Processing a 250 TB dataset with Xarray, Dask, and Coiled


I wanted to share a blog post building off of @dcherian’s work to process the national water model retrospective dataset: Processing a 250 TB dataset with Coiled, Dask, and Xarray | by Sarah Johnson | Coiled | Sep, 2023 | Medium. I’m hoping it will be useful to folks here processing large datasets in the cloud.

We run a big-but-simple Xarray problem to experience some of the pain with operating at scale. And we mess around with optimizing cloud costs. This example does use Coiled to run in the cloud, but it fits well within in the free tier.

In the future, I’m hoping to run more complex workflows (eg rechunking), suggestions welcome!