Blog post: Processing a 250 TB dataset with Xarray, Dask, and Coiled

scharlottej13 · September 5, 2023, 4:57pm

Hi!

I wanted to share a blog post building off of @dcherian’s work to process the national water model retrospective dataset: Processing a 250 TB dataset with Coiled, Dask, and Xarray | by Sarah Johnson | Coiled | Sep, 2023 | Medium. I’m hoping it will be useful to folks here processing large datasets in the cloud.

We run a big-but-simple Xarray problem to experience some of the pain with operating at scale. And we mess around with optimizing cloud costs. This example does use Coiled to run in the cloud, but it fits well within in the free tier.

In the future, I’m hoping to run more complex workflows (eg rechunking), suggestions welcome!

Topic		Replies	Views
Processing Terabyte-Scale NASA Cloud Datasets with Coiled Cloud	1	358	November 27, 2023
Large Scale Geospatial Benchmarks News & Announcements	2	220	October 22, 2024
Large-scale data processing benchmarks for Xarray-Beam	6	1590	June 13, 2022
Any interest in using Ray? Cloud HPC	2	873	September 24, 2021
ESIP Cloud Computing Cluster Session June 26: Dask and Coiled with Matt Rocklin of Coiled.io	1	669	June 6, 2023

Blog post: Processing a 250 TB dataset with Xarray, Dask, and Coiled

Related topics