I wrote the article about the cloud that I wish I could have read back when I first heard of Zarr and cloud-native science in 2018.
I hope that people here find this useful, as a reference or as clarification of certain subtleties.
It also includes a (slightly contrived) benchmark of reading HDF from object storage as if it were stored as a local file vs reading Zarr and Icechunk.
I feel that this benchmark is a more streamlined (it disables caching entirely) demonstration of the challenge explored by @betolink and others, e.g. in his Pangeo Showcase:
–
This post is actually the second in Earthmover’s “Fundamentals” series - the first was @rabernat’s post last week on “Tensors vs Tables”: