Access GES DISC NASA dataset using xarray and dask on a cluster


I am trying to access MERRA-2 dataset using opendap links on xarray.
The code below is based on a tutorial that @betolink sent me as an example.

The code runs well if parallel=False, but returns OSError: [Errno -70] NetCDF: DAP server error if I set parallel=True, no matter if I create the cluster or not.

@betolink suspected that the workers doesn’t know the authentication and suggested me to do something like mentioned in @rsignell issue.

Which would involve adding client.register_worker_plugin(UploadFile('~/.netrc')) after creating the client. I also tested that but returned the same error. In the code below I had to replace ~/.netrc for the full path because it was returning file not found error.

It is important to say that parallel=True works fine on my local computer using Ubuntu by WSL.

Has anyone faced this problem before or has any guesses on how to solve this issue?

# ----------------------------------
# Import Python modules
# ----------------------------------

import warnings


import xarray as xr
import matplotlib.pyplot as plt
from calendar import monthrange

create_cluster = True
parallel = True
upload_file = True

if create_cluster:
    # --------------------------------------
    # Creating 50 workers with 1core and 2Gb each
    # --------------------------------------
    import os
    from dask_jobqueue import SLURMCluster
    from dask.distributed import Client
    from dask.distributed import WorkerPlugin

    class UploadFile(WorkerPlugin):
        """A WorkerPlugin to upload a local file to workers.
        filepath: str
            A path to the file to upload
        >>> client.register_worker_plugin(UploadFile(".env"))
        def __init__(self, filepath):
            Initialize the plugin by reading in the data from the given file.

            self.filename = os.path.basename(filepath)
            self.dirname = os.path.dirname(filepath)
            with open(filepath, "rb") as f:

        async def setup(self, worker):
            if not os.path.exists(self.dirname):
            with open(self.filename, "wb+") as f:
            return os.listdir()
    cluster = SLURMCluster(cores=1, memory="40GB")

    client = Client(cluster)  # Connect this local process to remote workers
    if upload_file:

# ---------------------------------
# Read data
# ---------------------------------
# MERRA-2 collection (hourly)
collection_shortname = 'M2T1NXAER'
collection_longname  = 'tavg1_2d_aer_Nx'
collection_number = 'MERRA2_400'  
MERRA2_version = '5.12.4'
year = 2020
# Open dataset
# Read selected days in the same month and year
month = 1  # January
day_beg = 1
day_end = 31
# Note that collection_number is MERRA2_401 in a few cases, refer to "Records of MERRA-2 Data Reprocessing and Service Changes"
if year == 2020 and month == 9:
    collection_number = 'MERRA2_401'

url = '{}.{}/{}/{:0>2d}'.format(collection_shortname, MERRA2_version, year, month)
files_month = ['{}/{}.{}.{}{:0>2d}{:0>2d}.nc4'.format(url,collection_number, collection_longname, year, month, days) for days in range(day_beg,day_end+1,1)]
# Get the number of files

# Print
print("{} files to be opened:".format(len_files_month))
print("files_month", files_month)

# Read dataset URLs
ds = xr.open_mfdataset(files_month, parallel=parallel)
# View metadata (function like ncdump -c)

It might be related to the issue, but the code below is for a different GES DISC dataset from a different server goldsmr5 and the code also break for parallel=True. The same code works well for parallel=False.

import xarray as xr
import pandas as pd

# list daily timestamps from a starting date to an end date
dts = pd.date_range(start = "2013-01-01", end = "2022-12-01", freq = "D")

# base url for the files
base = ""
# prefix of the files
prefix = "MERRA2_400.inst3_3d_asm_Np"

# function to construct the url of the files
url = lambda d: f"{base}{d.year}/{d.month:02d}/{prefix}.{d.year}{d.month:02d}{}.nc4"

# list of urls for the files
fnames = [url(dti) for dti in dts]

# lets try to load the first 5 files
ds = xr.open_mfdataset(fnames[:5], parallel=True)

The error:

OSError: [Errno -51] NetCDF: Unknown file format: b''