Read multiple tiff image using zarr

Ujjwal_Kumar_Gupta · July 9, 2020, 3:08pm

Hi Friends ,

i have multiple tiff images each for every 5 min and having dimension of 4400 x 4400 and is having ghi value stored in variable .
I need to create a python api which can give historical data of ghi values for a given latitude and longitude . The data i need is for 1 day , 1 week and 1 month which ever the user selects .
What will be the best approach for this .
Currently i am using zarr but its taking long time to read the data . For getting data for 1 month its taking around 2 min and the data is stored in s3 and script is running on EC2 server .

rsignell · January 20, 2021, 8:22pm

I’m guessing the poor performance to read a time series are because the data was stored in Zarr using the same chunking scheme as the original data (time=1, x=4400, y=4400)?

To allow time series extraction in a reasonable length of time, you would want something (time=144, x=400, y=400), which for floats or 32-bit integers would be about 100MB chunks.

If you used this chunking scheme for 2 years of hourly data, users who want to read a time series at a specified x,y location would read about the same number of chunks as a user who wants to read the entire x,y field at a specified time:

(4400*4400)/(400*400) = 121   
2*(365.25*24)/144 = 121.74

With a cluster of 30 workers, the read times would be a few seconds for each. Does this make sense?

Topic		Replies	Views
Welcome, I need some support for the design of a forecast archive with Zarr Data	10	1172	April 23, 2022
Extremely slow rechunking of Zarr store with xarray Data	16	4033	October 22, 2021
Optimising Access For Zarr on S3 Data by LAT/LONG (Dask) Data	11	1676	April 25, 2022
Best practice reading zarr from s3 Cloud	8	4616	July 28, 2022
Zarr era5 reading causes huge number of tasks Cloud	9	1387	September 22, 2021

Read multiple tiff image using zarr

Related topics