What are the best practices to store data in zarr and how scalable is zarr?

Kaboom_Official · April 28, 2026, 4:16pm

I know Zarr performs better, especially for cloud workflows, but I have a few practical concerns.

I already have large archives of NetCDF/HDF files, and converting everything to Zarr will itself take a significant amount of time and resources.(I have used virtualizarr for the same data and it works great and takes much less time when compared to converting to a actual zarr store)

So my main questions are:

Is it better to convert each individual NetCDF/HDF file into separate Zarr stores, or should I stack everything into a single Zarr dataset?
I tested a single consolidated Zarr store with around 2–3 TB of gridded geostationary satellite data, and it worked very fast.
But what happens at much larger scales, like petabytes?
Will a single Zarr store still be scalable and efficient, or does it become a bottleneck?
And what if data is not on a single grid that is it’s data from leo sats such as cloudsat ,gpm dpr etc ,we will not be able to use standard xarray funcs on it such as .slice etc

I’m trying to understand what’s the more practical and scalable approach in the long run.

norlandrhagen · April 28, 2026, 6:48pm

Hey there @Kaboom_Official, I can’t answer all your Q’s, but a few thoughts.

Is it better to convert each individual NetCDF/HDF file into separate Zarr stores, or should I stack everything into a single Zarr dataset?

IMO a big benefit of Zarr is the ability to have a single access point for a large data cube. This way a user doesn’t have to figure out how to concat/merge all of the NetCDFs into a dataset.

(I have used virtualizarr for the same data and it works great and takes much less time when compared to converting to a actual zarr store)

If you’re happy with your NetCDF chunking and data pipeline, VirtualiZarr should be a great fit. If you store your virtual Zarr stores in Icechunk, appending is an easy and safe operation. It gives you Zarr like performance and convenience without a total rewrite of your data. As far as the scaling goes, @TomNicholas has done some tests on this and it seems like you would need a absurd number of references before you ran into issues.

I’m sure others can chime in with more thoughts!

Topic		Replies	Views
How to convert hdf files to a single zarr store? What are the best practices? Data	4	230	December 12, 2025
Many netcdf to single zarr store using concurrent.futures Data	6	1560	March 29, 2022
Welcome, I need some support for the design of a forecast archive with Zarr Data	10	1308	April 23, 2022
Slow Zarr to Netcdf Data	4	730	April 7, 2021
Extremely slow rechunking of Zarr store with xarray Data	16	4513	October 22, 2021

What are the best practices to store data in zarr and how scalable is zarr?

Related topics