Welcome @rybchuk! Indeed it is a bit confusing.
First, it’s important to understand the layers. From low to high, we have
Zarr: this is the format that lots of cloud data are stored in on object stores (S3, GCS, etc.). As you know, Zarr is actually a directory, not a file. Cloud data often use consolidated metadata to remove the need to “list” the directories in the object store, which can be slow.
Fsspec implementation: fsspec is an API that allows us to read from different storage services using a common API. It’s a key piece to make all of this work. There are many different implementations of fsspec for different storage services (e.g. local files, http, s3, gcs, dropbox, ftp, etc. etc.). s3fs is the fsspec implementation for s3. When you do
fsspec.get_mapper('s3://bucket/path'), it actually dispatches to s3fs. It’s equivalent to:
fs = s3fs.S3Filesystem()
mapper = fs.get_mapper('s3://bucket/path')
mapper objects are the things we pass to Zarray or Zarr to actually open the dataset.
Xarray is the analysis library that can interpret the zarr stores as netcdf-like datasets. For Zarr, Xarray takes an fsspec mapping as its input to
open_zarr, opens this store with Zarr, and then decodes it according to xarray convetions.
Intake is a catalog for data. It is 100% optional! You do not need intake to open any cloud data, but it can make it more convenient. For LENS and CMIP6, we provide intake-esm compatible catalogs. (That’s what https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le.json is.) But you can always bypass that and go directly for the data, if you know where to look. For example, that json file points at a CSV file (https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le.csv) that has entries like this:
FLNS,net longwave flux at surface,atm,20C,daily,1.0,global,W/m2,1920-01-01 12:00:00,2005-12-31 12:00:00,s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS.zarr
FLNSC,clearsky net longwave flux at surface,atm,20C,daily,1.0,global,W/m2,1920-01-01 12:00:00,2005-12-31 12:00:00,s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC.zarr
From that CSV, you can see the links to the actual zarr stores, e.g.
s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC.zarr. If you’re using recent versions of xarray, zarr, and fsspec, you could just do
or more verbosely
fs = s3fs.S3FileSystem(anon=True)
mapper = fs.get_mapper('s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC.zarr')
Hope this helps. Disclaimer: all this code is untested.