Pangeo Forge - convert S3-hosted NetCDFs without local caching?

bmcandr · August 19, 2021, 9:13pm

I’ve successfully followed the NetCDF to Zarr Sequential Pangeo Forge tutorial on an AWS-hosted JupyterHub with the source NetCDFs hosted in a private S3 bucket. This workflow involves caching each file locally before performing the conversion. Is there a way to read the NetCDFs directly from S3 in a recipe and bypass the caching step (without deploying a Bakery)?

bmcandr · August 20, 2021, 1:16am

Reminder to self: RTM. For anyone else who ends up here, set cache_inputs=False when creating the recipe:

recipe = XarrayZarrRecipe(pattern,
                          inputs_per_chunk=100,
                          cache_inputs=False)

rabernat · August 20, 2021, 3:23pm

Yes, it is definitely possible. You should be able to just return s3:// or https:// urls in your FilePatten. DId cache_inputs=False solve your problem?

bmcandr · August 20, 2021, 7:07pm

It sure did! Thanks for checking @rabernat.

Topic		Replies	Views
Many netcdf to single zarr store using concurrent.futures Data	6	1457	March 29, 2022
Pangeo Cloud Data Cookbook Cloud	5	1367	March 25, 2021
Zarr on other S3-compatible storage (e.g. DigitalOcean)?	3	1055	October 7, 2020
How to grab data from Amazon? Science	4	653	June 10, 2021
S3 - Zarr / NetCDF access times using s3fs Data	13	3594	April 19, 2023

Pangeo Forge - convert S3-hosted NetCDFs without local caching?

Related topics