New Cloud Tensor I/O Benchmarks - Zarr is fast now!

rabernat · November 25, 2025, 6:09pm

Given all of the recent development on Zarr, we decided to revisit the question of Zarr performance in the cloud. We recently published a blog post with our results.

Here’s the key figure

Bottom line: Zarr Python (using either Icechunk or Obstore as the storage engine) is now able to fully saturate the network between EC2 and S3, resulting in the physically maximum possible throughput when reading and writing data. This holds true in the presence of compression and over a wide range of chunk sizes.

In our benchmarks, Zarr (with Icechunk and Obstore) is now faster than Tensorstore or TileDB. It is also faster than Polars (limited to 1D arrays), one of the dataframe libraries.

The benchmarks are all open source, available here. You can run them yourself to verify the results.

The post also goes into some details about the long journey we’ve been on to improve Zarr performance. Many people from this community have contributed to Zarr on these topics. This progress is a huge win for the scientific data community.

jbusecke · November 25, 2025, 6:19pm

That is so cool to see!

rsignell · December 1, 2025, 2:47pm

The optimal chunk size for Icechunk is 3-15 MB

Interesting that this size range basically agrees with the statement in Performance guidelines for Amazon S3 - Amazon Simple Storage Service where it says:

Typical sizes for byte-range requests are 8 MB or 16 MB

rsignell · December 1, 2025, 2:51pm

In the blog post you only use a single machine and saturate the single-machine I/O, right?

But in the Pangeo community previously we used to promote using distributed dask clusters on multiple machines to not have single-machine I/O be a limitation, right?

I’m a bit confused.

rabernat · December 1, 2025, 3:03pm

There is nothing wrong with scaling out to multiple machines. But this is much more expensive and complex than a single machine. Scaling out should only be done once you’ve hit the limits of a single machine.

In the past, I believe we were guilty of using scaling-out to improve net throughput before we had reached the physical limits of a single machine. Ultimately this is a waste of money. It’s never going to be cheaper or faster to use a distributed cluster where a single-node multithreaded architecture will suffice. For example, in our 2020 Earthcube Poster we got throughputs of 3 GB/s (24 Gbps) on a cluster of 40 nodes. That implies that each individual node is only getting ~500 Mbps, well below the network limit. Now we are getting 12-20 Gbps on a single modest-sized VM!

Put conversely, with a cluster of 40 such nodes, you should be able to get throughput of 800 Gbps (100 GB/s). You should be able to process 1 TB of data in 10s. We have not done that experiment, but there is no reason to think it would not work.

Topic		Replies	Views
Benchmark Notebook for Xarray to zarr on S3 Data zarr	3	280	August 23, 2025
Icechunk: A new cloud-native transactional storage engine for Zarr Cloud	1	270	October 16, 2024
Feedback on Zarr performance benchmarking HPC	1	1194	July 16, 2020
Extremly slow write to S3 bucket with xarray.Dataset.to_zarr Data	32	5281	December 6, 2023
Any suggestions s3 upload optimizations for large 3d zarr datasets Data	11	1345	May 28, 2022

New Cloud Tensor I/O Benchmarks - Zarr is fast now!

Related topics