I have a large dataset that has already been written, publicly available here: https://hackathon-o.s3-ext.jc.rl.ac.uk/sim-data/dev/v5/glm.n2560_RAL3p3/um.PT1H.hp_z10.zarr/.
I would like to append two new fields to this with:
ds_static = xr.Dataset()
ds_static['orog'] = hporog.copy()
ds_static['sftlf'] = hpland.copy()
ds_static.to_zarr(zarr_store, mode='a')
zarr_store
is an s3-like datastore, and I’m writing by setting up an s3fs.S3FileSystem
object - see here: wcrp_hackathon/scripts/process_um_data/um_process_tasks.py at 4838df8a93ba8dde17ecd6bacd8ef394bd7ddb50 · markmuetz/wcrp_hackathon · GitHub
However, this is absurdly slow (30 min+), and times out in my testing before any writes happen. Writing to a new store takes ~30s. I suspect it has to scan the large dataset first, which taken a very long time. Any help speeding this up would be much appreciated.