Read multiple tiff image using zarr

I’m guessing the poor performance to read a time series are because the data was stored in Zarr using the same chunking scheme as the original data (time=1, x=4400, y=4400)?

To allow time series extraction in a reasonable length of time, you would want something (time=144, x=400, y=400), which for floats or 32-bit integers would be about 100MB chunks.

If you used this chunking scheme for 2 years of hourly data, users who want to read a time series at a specified x,y location would read about the same number of chunks as a user who wants to read the entire x,y field at a specified time:

(4400*4400)/(400*400) = 121   
2*(365.25*24)/144 = 121.74

With a cluster of 30 workers, the read times would be a few seconds for each. Does this make sense?

2 Likes