Downloading HRRR for a specific geographic region in zarr format from S3

Hello!

I have been downloading downward shortwave-radiation flux data from the High Resolution Rapid Refresh (HRRR) from AWS S3, based on an example from Project Pythia (Plotting HRRR 2-meter temperatures — HRRR-AWS-Cookbook). Here’s the URL for the bucket: AWS S3 Explorer

image

I’ve had no problem downloading data for the whole domain. My question has to do with downloading just the data from a specific geographic region instead of the entire domain.

I’m fairly new to the zarr format, but I understand that the data is broken into chunks that are separately compressed and stored. Is it possible to download a specific chunk off of S3?

If I was able to download a specific chunk, is there any guarantee that it would contain data for the same geographic region in each model run?

Thanks in advance for any comments or suggestions!

Hello @brennanwx,

When you construct and open the dataset as you’ve shared in your code snippet, you’re already doing the majority of the work here. Important point - you’re not actually downloading the data for the whole domain when you use this method. Instead, behind the scenes, you’re making a giant mapping or list of all the data that you might need to download.

If you want to download just a specific geographic region, you just have to prune that list. Just further process ds to select a lat/lon box that corresponds to your region, and then write that subset dataset to disk. That will complete the regional extraction.

You shouldn’t need to know anything about the chunks. In practice, the folks populating this S3 bucket are almost certainly running the same chunking scheme for each HRRR run so there wouldn’t be any difference in which lat/lons belong to which chunk. But you as the end user should just treat the xarray.Dataset that you pull up as something of a “black box.” The only reason to worry about the chunks on S3 is if for some reason it’s very slow to run your job on the data.

1 Like