Data Array getting too big

Hi All,
I am want to get weighted average cloud cover from the pressure level data of the cloud percentages (CMIP6 variable cl). My code is as below -

### convert hybrid sigma level to pressure levels
cl_pr=geocat.comp.interpolation.interp_hybrid_to_pressure(cl_ds.cl,cl_ds.ps,cl_ds.ap,cl_ds.b)
### get weights depending on thickness of level
cl_low=cl_pr.sel(plev=slice(100000.0,70000.0))
pressure_top = 70000.0
pressure_levels = cl_low.plev
pressure_levels=pressure_levels.assign_attrs({'units':'Pa'})
cl_weights=geocat.comp.dpres_plevel(pressure_levels,cl_ds.ps,pressure_top)
cl_wt_avg_low=cl_low.weighted(cl_weights.fillna(0.0)).mean(dim='plev')

When I am trying to access it getting error-
MemoryError: Unable to allocate 93.5 TiB for an array with shape (43, 96, 192, 220, 4, 96, 192) and data type float64

1 Like

Welcome @diptiSH

The chunk sizes are really big: 93TiB! The recommendation is to set it to 100-200MB usually. Do this at the point you read your dataset.

I think something is going wrong here that is causing the dimensionality of the data to explode. The total array size is 37 PB! Not possible–all of CMIP6 is “only” 20 PB.

Can you post the xarray reprs of the dataset cl_ds and cl_wt_avg_low?

Thanks, @dcherian, and @rabernat for your replies. I could fix this issue. When I was calculating weights based on vertical thickness, cl_weights=geocat.comp.dpres_plevel(pressure_levels,cl_ds.ps,pressure_top)
It was adding new dimensions, dim0-dim2 instead of standard dimension so when I applied these weights to calculate means.
cl_wt_avg_low=cl_low.weighted(cl_weights).mean(dim=‘plev’)
It became 7 dimensional array and blew up. I fixed the dimensions of cl_weights and it is working fine now.