Slow xarray.open_zarr with "default" s3fs 0.4.2, fixed in latest 0.5.1

I’ve been struggling with high latency when opening a particular zarr store on S3 that has consolidated metadata. After much R&D (with much help from [1]), I’ve found that the latest s3fs release (available on conda-forge), 0.5.1, largely fixes my problem, going from 50 sec to 4 sec delay.

The problem was that s3fs 0.4.2 is being installed when not pinned to a specific version. I was assuming the most recent version that doesn’t cause conflicts would be installed. The bare minimum conda env I need looks like this:

conda create -n myenv -c conda-forge python ipykernel matplotlib xarray zarr botocore boto3 s3fs fsspec

The breakthrough was adding version pinning: s3fs=0.5.1. Digging further, it looks like boto3 was the culprit, leading to the older s3fs version; if I create the env w/o boto3, s3fs 0.5.1 is installed. However, I’ve been using boto3.Session to start a session with an .aws/credentials file. This still worked fine with s3fs=0.5.1, but for now I’ve switched to botocore.session.Session and skipped the boto3 installation.

Hopefully this information will spare someone some pain. But it’d be great if the stack defaults didn’t lead to the older, less performant s3fs. Also, while this testing was largely done on my Ubuntu laptop, 0.4.2 is also what’s installed on the AWS Pangeo base notebook image; so, I’m still stuck with the high latency there. I’ll follow up on that odd latency in a separate post.

[1] https://github.com/dask/s3fs/issues/285, https://github.com/dask/s3fs/issues/279

1 Like

thanks for documenting this @emiliom. I’m perplexed about the dependency piece, I don’t know the best way of figuring these things out, but one approach I’ve taken in the past is listing the dependencies of the package version you want to install versus the old version that is being installed conda search s3fs==0.5.1 --info :

s3fs 0.5.1 py_0
dependencies: 
  - aiobotocore
  - botocore
  - fsspec >=0.8.0
  - python >=3.6

Loading channels: done
s3fs 0.4.2 py_0
dependencies: 
  - boto3
  - fsspec >=0.6.0
  - python >=3.5

I’ve opened an issue here to get the version upgraded in the default docker image https://github.com/pangeo-data/pangeo-docker-images/issues/165

I didn’t know about this functionality. Nice!

1 Like