Specifically on async: s3 file system instances are always async internally. The argument is used to specify whether you will be calling the instance from async def
code - you do not need it.
When combined with zarr, multiple chunks of a given variable can be requested concurrently, so you do not pay the latency cost many times over; it does not actually improve your maximum bandwidth, though. Furthermore, zarr does not currently support concurrent reads across different variables. It would be nice!
Using xarray, as opposed to zarr directly, gives you indexing by coordinate and eagerly loads the coordinate arrays. You are not using this, so perhaps it is not useful.
You are already using dask, this is assumed by open_zarr (not the same default as open_dataset(engine=“zarr”).
However, it is strange that the isel()
call is taking long, and we would be interested in knowing why. There is no data loading happening, but you are constructing a dask graph. Maybe passing chunks=None
helps you, in which case dask has some work to do.