Downloading HRRR for a specific geographic region in zarr format from S3

Brennanwx · August 8, 2022, 3:44pm

Hello!

I have been downloading downward shortwave-radiation flux data from the High Resolution Rapid Refresh (HRRR) from AWS S3, based on an example from Project Pythia (Plotting HRRR 2-meter temperatures — HRRR-AWS-Cookbook). Here’s the URL for the bucket: AWS S3 Explorer

I’ve had no problem downloading data for the whole domain. My question has to do with downloading just the data from a specific geographic region instead of the entire domain.

I’m fairly new to the zarr format, but I understand that the data is broken into chunks that are separately compressed and stored. Is it possible to download a specific chunk off of S3?

If I was able to download a specific chunk, is there any guarantee that it would contain data for the same geographic region in each model run?

Thanks in advance for any comments or suggestions!

darothen · August 8, 2022, 9:03pm

Hello @brennanwx,

When you construct and open the dataset as you’ve shared in your code snippet, you’re already doing the majority of the work here. Important point - you’re not actually downloading the data for the whole domain when you use this method. Instead, behind the scenes, you’re making a giant mapping or list of all the data that you might need to download.

If you want to download just a specific geographic region, you just have to prune that list. Just further process ds to select a lat/lon box that corresponds to your region, and then write that subset dataset to disk. That will complete the regional extraction.

You shouldn’t need to know anything about the chunks. In practice, the folks populating this S3 bucket are almost certainly running the same chunking scheme for each HRRR run so there wouldn’t be any difference in which lat/lons belong to which chunk. But you as the end user should just treat the xarray.Dataset that you pull up as something of a “black box.” The only reason to worry about the chunks on S3 is if for some reason it’s very slow to run your job on the data.

Topic		Replies	Views
Optimising Access For Zarr on S3 Data by LAT/LONG (Dask) Data	11	1648	April 25, 2022
Am I thinking about this data processing/chunking workflow correctly? Data	8	1062	June 9, 2023
Best practice reading zarr from s3 Cloud	8	4450	July 28, 2022
Any suggestions for efficiently operating over windows of data? Data	4	1193	February 2, 2023
xr.DataArray.chunks, np.digitize and xr.DataArray.groupby, and dask Science	2	674	January 16, 2022

Downloading HRRR for a specific geographic region in zarr format from S3

Related topics