Fundamentals: What is Cloud-Optimized Scientific Data?

I wrote the article about the cloud that I wish I could have read back when I first heard of Zarr and cloud-native science in 2018.

I hope that people here find this useful, as a reference or as clarification of certain subtleties.

It also includes a (slightly contrived) benchmark of reading HDF from object storage as if it were stored as a local file vs reading Zarr and Icechunk.

I feel that this benchmark is a more streamlined (it disables caching entirely) demonstration of the challenge explored by @betolink and others, e.g. in his Pangeo Showcase:

–

This post is actually the second in Earthmover’s “Fundamentals” series - the first was @rabernat’s post last week on “Tensors vs Tables”:

5 Likes