Pangeo Forge - convert S3-hosted NetCDFs without local caching?

I’ve successfully followed the NetCDF to Zarr Sequential Pangeo Forge tutorial on an AWS-hosted JupyterHub with the source NetCDFs hosted in a private S3 bucket. This workflow involves caching each file locally before performing the conversion. Is there a way to read the NetCDFs directly from S3 in a recipe and bypass the caching step (without deploying a Bakery)?

3 Likes

Reminder to self: RTM. For anyone else who ends up here, set cache_inputs=False when creating the recipe:

recipe = XarrayZarrRecipe(pattern,
                          inputs_per_chunk=100,
                          cache_inputs=False)
1 Like

Yes, it is definitely possible. You should be able to just return s3:// or https:// urls in your FilePatten. DId cache_inputs=False solve your problem?

1 Like

It sure did! Thanks for checking @rabernat.

1 Like