Status code: 404 on NOAA OISST data on Pangeo-Forge

Hi,

I am trying to access NOAA’s OISST AVHHR data stored on the cloud; however, instead of seeing the usual block of code where I can copy and paste to simply run and load in the dataset in my notebook, I am seeing this error message:

An error occurred while fetching data from URL: https://api.pangeo-forge.org/repr/xarray/?url=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/aws-noaa-oisst-feedstock/aws-noaa-oisst-avhrr-only.zarr
{"detail":"An error occurred while fetching the data from URL: https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/aws-noaa-oisst-feedstock/aws-noaa-oisst-avhrr-only.zarr. Dataset not found."}

The dataset seems to not live where it’s supposed to and here is the link to the dataset on pangeo-forge’s catalog. I am having trouble tracking down the maintainer for this dataset so I would appreciate anyone’s help on resolving this matter. Thanks!

:wave:t5: @stb2145, the dataset itself is fine ( this is a kerchunk-ed dataset "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/aws-noaa-oisst-feedstock/aws-noaa-oisst-avhrr-only.zarr/reference.json"), and can be accessed via xarray using the following code snippet.

In [1]: import xarray as xr

In [2]: url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/aws-noaa-oisst-feedstock/aws-noaa-oisst-avhrr-only.zarr/reference.json"

In [3]: ds = xr.open_dataset("reference://", engine='zarr',
   ...:                      backend_kwargs={'consolidated': False,
   ...:                                      'storage_options': {'fo': url, 'remote_options': {'anon': True}, 'remote_protocol': 's3'}},
   ...:                      chunks={})


In [4]: 

In [4]: ds
Out[4]: 
<xarray.Dataset>
Dimensions:  (time: 15044, zlev: 1, lat: 720, lon: 1440)
Coordinates:
  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9
  * time     (time) datetime64[ns] 1981-09-01T12:00:00 ... 2022-11-08T12:00:00
  * zlev     (zlev) float32 0.0
Data variables:
    anom     (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    err      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    ice      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
    sst      (time, zlev, lat, lon) float32 dask.array<chunksize=(1, 1, 720, 1440), meta=np.ndarray>
Attributes: (12/37)
    Conventions:                CF-1.6, ACDD-1.3
    cdm_data_type:              Grid
    comment:                    Data was converted from NetCDF-3 to NetCDF-4 ...
    creator_email:              oisst-help@noaa.gov
    creator_url:                https://www.ncei.noaa.gov/
    date_created:               2020-05-08T19:05:13Z
    ...                         ...
    source:                     ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfin...
    standard_name_vocabulary:   CF Standard Name Table (v40, 25 January 2017)
    summary:                    NOAAs 1/4-degree Daily Optimum Interpolation ...
    time_coverage_end:          1981-09-01T23:59:59Z
    time_coverage_start:        1981-09-01T00:00:00Z
    title:                      NOAA/NCEI 1/4 Degree Daily Optimum Interpolat...

the issue on pangeo-forge.org has to do with some hardcoded assumptions in the codebase used to preview the dataset, and is being tracked in this issue Add kerchunk opener to `repr` route · Issue #200 · pangeo-forge/pangeo-forge-orchestrator · GitHub

3 Likes

Great, thank you so much for the quick help, Anderson! I guess I was just missing the /reference.json part…

2 Likes