MemoryError when trying to save a dataset to a NetCDF file

Hello,

Am working with MOD16A2GF data (Evapotranspiration) as an .nc file with a 8-day temporal resolution, and 500 m spatial resolution as the below pic.

Then I converted the time from object to datetime64[ns] and replaced _FillValue in the dataset using the below code:

# Convert the 'time' coordinate from cftime.DatetimeJulian to datetime64[ns]
time_values = mod16netcdf['time'].values
converted_time = pd.to_datetime([t.strftime('%Y-%m-%d') for t in time_values])

# Replace the 'time' coordinate in the dataset
mod16netcdf ['time'] = converted_time

# Verify the conversion
mod16netcdf 

# Replace values between 32761 and 32767 (inclusive) in 'ET_500m' and 'PET_500m' with NaN
mod16netcdf['ET_500m'] = mod16netcdf['ET_500m'].where(~((mod16netcdf['ET_500m'] >= 3276.1) & (mod16netcdf['ET_500m'] <= 3276.7)), np.nan)
mod16netcdf['PET_500m'] = mod16netcdf['PET_500m'].where(~((mod16netcdf['PET_500m'] >= 3276.1) & (mod16netcdf['PET_500m'] <= 3276.7)), np.nan)

After that I calculated the real data by multiplying the scale factor and then converted it into monthly data using the below code:

#Claculate the real_value by multiply by 0.1 (scale factor) 
ET_scaled= mod16netcdf['ET_500m']*0.1
PET_scaled= mod16netcdf['PET_500m']*0.1

# Convert 8-day ET_scaled and PET_scaled data to monthly data
# For monthy mean the unit is mm/8day (the average 8-day mean rate of evapotranspiration over all 8-day intervals in a month)
ET_monthly_mean_scaled = ET_scaled.resample(time='ME').mean()
PET_monthly_mean_scaled = PET_scaled.resample(time='ME').mean()
ET_Q_monthly_mean_scaled = mod16netcdf['ET_QC_500m'].resample(time='ME').mean()

# For monthy sum the unit is mm/month (the sum of 8-day rate of evapotranspiration over a month)
ET_monthly_sum_scaled = ET_scaled.resample(time='ME').sum()
PET_monthly_sum_scaled = PET_scaled.resample(time='ME').sum()
ET_Q_monthly_sum_scaled = mod16netcdf['ET_QC_500m'].resample(time='ME').sum()

# Create a new Dataset to hold all the variables
ds = xr.Dataset({
    'ET_monthly_mean_scaled_mmper8day': ET_monthly_mean_scaled,
    'PET_monthly_mean_scaled_mmper8day': PET_monthly_mean_scaled,
    'ET_monthly_sum_scaled_mmpermonth': ET_monthly_sum_scaled,
    'PET_monthly_sum_scaled_mmpermonth': PET_monthly_sum_scaled,
    'ET_Q_monthly_mean_scaled': ET_Q_monthly_mean_scaled,
    'ET_Q_monthly_sum_scaled': ET_Q_monthly_sum_scaled 

})

ds

Now as a final step I created a function to upscale the modified data resolution to 0.05 °, 0.25 °, and 0.05 ° to then save it into a new .nc file for each resolution using the below code:

def upscale_dataset(dataset, target_res):
    """
    Upscale all variables in the dataset to a coarser resolution.
    
    Parameters:
    dataset (xarray.Dataset): The input dataset with variables to upscale.
    target_res (float): The target spatial resolution (e.g., 0.25 or 1.0 degrees).
    
    Returns:
    xarray.Dataset: The upscaled dataset with all variables at the new resolution.
    """
    # Get the current resolution and coordinate steps
    lat_res = np.abs(dataset.lat[1] - dataset.lat[0])
    lon_res = np.abs(dataset.lon[1] - dataset.lon[0])
    
    # Calculate the number of original grid cells per target resolution
    lat_factor = int(target_res / lat_res)
    lon_factor = int(target_res / lon_res)
    
    # Dictionary to hold the upscaled variables
    upscaled_vars = {}
    
    # Loop over all variables in the dataset
    for var_name in dataset.data_vars:
        # Apply coarsen and mean to each variable
        upscaled_vars[var_name] = (
            dataset[var_name]
            .coarsen(lat=lat_factor, lon=lon_factor, boundary="trim")
            .mean()
        )
    
    # Create a new dataset with upscaled variables
    dataset_upscaled = xr.Dataset(
        upscaled_vars,
        coords={
            'lat': upscaled_vars[var_name].lat,
            'lon': upscaled_vars[var_name].lon,
            'time': dataset.time
        }
    )
    
    return dataset_upscaled

# Upscale to 0.05-degree resolution
evtp_ds_005 = upscale_dataset(ds, 0.05)

evtp_ds_005

But when I try to save evtp_ds_005 into an .nc file I get an error saying:
MemoryError: Unable to allocate 40.1 GiB for an array with shape (1059, 3912, 2596) and data type float32

I tried to subset the data but also got the same error even when using one subset:

subset_1 = evtp_ds_005.isel(time=slice(0, 100))
subset_1.to_netcdf('EVTP_upscaled_0.05deg_subset_1.nc')

I also tried chunks but the error remained.

The below pic shows the dataset I want to save as .nc

Could you provide a link to the file?

2 Likes

Sure, the file is around 13 GB:

You’ll need to specify chunks at read time: e.g. {"time": 1}.

Also see xarray-regrid: Regridding utilities for xarray — xarray-regrid 0.4.0 documentation for an easy way to do the coarsening

1 Like

Hey @dcherian
Using chunks at read time worked perfectly! Thank you so much.

1 Like