Hello All,
I’m trying to plot a 60gb tiled dataset (EO derived MODIS tiles) on a 64gb machine and am consistently running out of memory when plotting. Is there a standard way to plot such a large datasets while using Dask to avoid loading everything into memory (one tile is 23mb)? Am asking as that would later enable data resampling and statistics before plotting.
I have tried
ds = xr.open_mfdataset(files,
combine='by_coords',
chunks={
"band": 1,
"x": 512,
"y": 512
},
parallel=True)
plot = ds.sel(band=1).hvplot.image(x='x',
y='y',
rasterize=True,
cmap='viridis',
title='SEN4GPP - 2020-01-01\n ',
frame_width=800,
frame_height=800)
hvplot.save(plot, "foo.png", fmt="png")
and
ds = xr.open_mfdataset(files_dir,
combine='by_coords',
chunks={
"band": 1,
"x": 512,
"y": 512
},
parallel=True)
xr.plot.imshow(ds.isel(band=0), x='x', y='y')
plt.savefig('foo.png')
as they’re the common examples found online. Alternatively, it seems like I should be able to convert the output data into a point-cloud and then plot that, though that feels like too much effort…
I can make a sample set available, if it helps.
Thanks very much for your help!
Kind regards
PS: XArray dataset info, using up-to-data conda-forge xarray packages.
Dataset tile info:
<xarray.Dataset> Size: 23MB
Dimensions: (band: 1, x: 2400, y: 2400)
Coordinates:
- band (band) int64 8B 1
- x (x) float64 19kB -2.001e+07 -2.001e+07 … -1.89e+07 -1.89e+07
- y (y) float64 19kB 1.112e+06 1.111e+06 1.111e+06 … 695.0 231.7
spatial_ref int64 8B …
Data variables:
band_data (band, y, x) float32 23MB …
Whole dataset info:
<xarray.Dataset> Size: 60GB
Dimensions: (band: 2, y: 43200, x: 86400)
Coordinates:
- band (band) int64 16B 1 2
- x (x) float64 691kB -2.001e+07 -2.001e+07 … 2.001e+07 2.001e+07
- y (y) float64 346kB 1.001e+07 1.001e+07 … -1.001e+07 -1.001e+07
spatial_ref int64 8B 0
Data variables:
band_data (band, y, x) float64 60GB dask.array<chunksize=(1, 512, 480), meta=np.ndarray>