This is my first post to this community. I’m working on a pilot project with JASMIN, a data analysis facility for the NERC community in the UK. It’s a hybrid of traditional batch system and OpenStack cloud. We’ve recently set up a system so that groups make their own instance of Pangeo in OpenStack projects on our system. We are also experimenting with storing CMIP6 data on an object store which is also part of JASMIN. This is S3 compatible.
I’ve looked at https://pangeo.io/data.html to see how you’ve gone about uploading data to Google cloud object storage with xarray and zarr. We would like to do a workflow where we transfer CMIP6 data from our POSIX file system to our object store. Looking at the guide, It looks like for 2) we could miss a step and write direct to our object store. I think we can do that with the to_zarr call described but would appreciate some tips. I’m not sure how we give to_zarr a handle to tell it the location of the object store rather than a POSIX directory path.
We’d also appreciate any pointers with using Dask to scale this out into a larger workflow.
Perhaps you also have some general experience you could share from the work you’ve done uploading CMIP6 data into the public cloud?