Hello,
Am working with MOD16A2GF data (Evapotranspiration) as an .nc file with a 8-day temporal resolution, and 500 m spatial resolution as the below pic.
Then I converted the time from object to datetime64[ns] and replaced _FillValue in the dataset using the below code:
# Convert the 'time' coordinate from cftime.DatetimeJulian to datetime64[ns]
time_values = mod16netcdf['time'].values
converted_time = pd.to_datetime([t.strftime('%Y-%m-%d') for t in time_values])
# Replace the 'time' coordinate in the dataset
mod16netcdf ['time'] = converted_time
# Verify the conversion
mod16netcdf
# Replace values between 32761 and 32767 (inclusive) in 'ET_500m' and 'PET_500m' with NaN
mod16netcdf['ET_500m'] = mod16netcdf['ET_500m'].where(~((mod16netcdf['ET_500m'] >= 3276.1) & (mod16netcdf['ET_500m'] <= 3276.7)), np.nan)
mod16netcdf['PET_500m'] = mod16netcdf['PET_500m'].where(~((mod16netcdf['PET_500m'] >= 3276.1) & (mod16netcdf['PET_500m'] <= 3276.7)), np.nan)
After that I calculated the real data by multiplying the scale factor and then converted it into monthly data using the below code:
#Claculate the real_value by multiply by 0.1 (scale factor)
ET_scaled= mod16netcdf['ET_500m']*0.1
PET_scaled= mod16netcdf['PET_500m']*0.1
# Convert 8-day ET_scaled and PET_scaled data to monthly data
# For monthy mean the unit is mm/8day (the average 8-day mean rate of evapotranspiration over all 8-day intervals in a month)
ET_monthly_mean_scaled = ET_scaled.resample(time='ME').mean()
PET_monthly_mean_scaled = PET_scaled.resample(time='ME').mean()
ET_Q_monthly_mean_scaled = mod16netcdf['ET_QC_500m'].resample(time='ME').mean()
# For monthy sum the unit is mm/month (the sum of 8-day rate of evapotranspiration over a month)
ET_monthly_sum_scaled = ET_scaled.resample(time='ME').sum()
PET_monthly_sum_scaled = PET_scaled.resample(time='ME').sum()
ET_Q_monthly_sum_scaled = mod16netcdf['ET_QC_500m'].resample(time='ME').sum()
# Create a new Dataset to hold all the variables
ds = xr.Dataset({
'ET_monthly_mean_scaled_mmper8day': ET_monthly_mean_scaled,
'PET_monthly_mean_scaled_mmper8day': PET_monthly_mean_scaled,
'ET_monthly_sum_scaled_mmpermonth': ET_monthly_sum_scaled,
'PET_monthly_sum_scaled_mmpermonth': PET_monthly_sum_scaled,
'ET_Q_monthly_mean_scaled': ET_Q_monthly_mean_scaled,
'ET_Q_monthly_sum_scaled': ET_Q_monthly_sum_scaled
})
ds
Now as a final step I created a function to upscale the modified data resolution to 0.05 °, 0.25 °, and 0.05 ° to then save it into a new .nc file for each resolution using the below code:
def upscale_dataset(dataset, target_res):
"""
Upscale all variables in the dataset to a coarser resolution.
Parameters:
dataset (xarray.Dataset): The input dataset with variables to upscale.
target_res (float): The target spatial resolution (e.g., 0.25 or 1.0 degrees).
Returns:
xarray.Dataset: The upscaled dataset with all variables at the new resolution.
"""
# Get the current resolution and coordinate steps
lat_res = np.abs(dataset.lat[1] - dataset.lat[0])
lon_res = np.abs(dataset.lon[1] - dataset.lon[0])
# Calculate the number of original grid cells per target resolution
lat_factor = int(target_res / lat_res)
lon_factor = int(target_res / lon_res)
# Dictionary to hold the upscaled variables
upscaled_vars = {}
# Loop over all variables in the dataset
for var_name in dataset.data_vars:
# Apply coarsen and mean to each variable
upscaled_vars[var_name] = (
dataset[var_name]
.coarsen(lat=lat_factor, lon=lon_factor, boundary="trim")
.mean()
)
# Create a new dataset with upscaled variables
dataset_upscaled = xr.Dataset(
upscaled_vars,
coords={
'lat': upscaled_vars[var_name].lat,
'lon': upscaled_vars[var_name].lon,
'time': dataset.time
}
)
return dataset_upscaled
# Upscale to 0.05-degree resolution
evtp_ds_005 = upscale_dataset(ds, 0.05)
evtp_ds_005
But when I try to save evtp_ds_005 into an .nc file I get an error saying:
MemoryError: Unable to allocate 40.1 GiB for an array with shape (1059, 3912, 2596) and data type float32
I tried to subset the data but also got the same error even when using one subset:
subset_1 = evtp_ds_005.isel(time=slice(0, 100))
subset_1.to_netcdf('EVTP_upscaled_0.05deg_subset_1.nc')
I also tried chunks but the error remained.