Read multiple tiff image using zarr

rsignell · January 20, 2021, 8:22pm

I’m guessing the poor performance to read a time series are because the data was stored in Zarr using the same chunking scheme as the original data (time=1, x=4400, y=4400)?

To allow time series extraction in a reasonable length of time, you would want something (time=144, x=400, y=400), which for floats or 32-bit integers would be about 100MB chunks.

If you used this chunking scheme for 2 years of hourly data, users who want to read a time series at a specified x,y location would read about the same number of chunks as a user who wants to read the entire x,y field at a specified time:

(4400*4400)/(400*400) = 121   
2*(365.25*24)/144 = 121.74

With a cluster of 30 workers, the read times would be a few seconds for each. Does this make sense?

Topic		Replies	Views
Extremely slow rechunking of Zarr store with xarray Data	16	4130	October 22, 2021
Welcome, I need some support for the design of a forecast archive with Zarr Data	10	1186	April 23, 2022
High resolution time series; open_zarr question Science	3	1114	July 2, 2020
Zarr era5 reading causes huge number of tasks Cloud	9	1415	September 22, 2021
Optimising Access For Zarr on S3 Data by LAT/LONG (Dask) Data	11	1683	April 25, 2022

Read multiple tiff image using zarr

Related topics