Guidance on using Zarr on HPC (chunking, multi-file layouts, and node scaling for optimal CPU/memory use)

lakshmitharun · March 15, 2026, 10:08am

Hi all,
I’m working on an HPC workflow where we store large gridded datasets (SST-like, 3D arrays with dimensions similar to time × 3600 × 7200) in Zarr format on a parallel file system. I’m looking for any pointers to documentation, examples, or “rules of thumb” specific to using Zarr efficiently on HPC systems (POSIX/Lustre/GPFS, not cloud object storage).[github +1]
Concretely, I’m trying to understand:
• How to choose chunk sizes (e.g. time-major vs space-major, target chunk byte size like 100–500 MB, etc.) so that Zarr + Dask run efficiently on multi-node clusters without overloading memory or the metadata servers.[pangeo-data.github +2]
• How to decide between a single large Zarr store versus splitting into multiple Zarr stores (e.g. yearly or monthly) for both performance and manageability on HPC.[discourse.pangeo +1]
• How to reason about how many nodes / workers to use for a given Zarr layout (chunk size, number of files, total dataset size) so that CPU utilization is high but per-worker memory stays within limits, especially when using dask-jobqueue or similar.[gallery.pangeo +1]
• Any known pitfalls or best practices for Zarr on HPC file systems (e.g. inode limits from many small chunks, when to use ZipStore, consolidate_metadata, or larger 100–500 MB chunks, etc.).[dask +2]
If there are existing Pangeo docs, tutorials, or discussion threads that cover “Zarr on HPC best practices” or present benchmark results (e.g. recommended chunk sizes / auto-chunk settings for POSIX), links would be very helpful.

Topic		Replies	Views
Feedback on Zarr performance benchmarking HPC	1	1243	July 16, 2020
What are the best practices to store data in zarr and how scalable is zarr? Data	4	217	June 3, 2026
Am I thinking about this data processing/chunking workflow correctly? Data	8	1189	June 9, 2023
Zarr-of-Zarrs / Multi-dimensional Kerchunking Data	7	1588	February 21, 2024
Stream Zarr data from HPC to local machine HPC	8	1154	March 12, 2024

Guidance on using Zarr on HPC (chunking, multi-file layouts, and node scaling for optimal CPU/memory use)

Related topics