Hello,
I am trying to sample hourly model output to match the time and location of observations collected during a flight.
There are only 1000 points and it seems the “lazy” selection works fine. However, I have never managed to actually complete the calculation.
the dataset is created using open_mfdataset (6 files ~6.8Gb per file)
xarray.Dataset
Dimensions:
lat: 180lon: 288time: 52608pfull_sub03: 21phalf: 50
Coordinates:
lat
(lat)
float64
-89.5 -88.5 -87.5 ... 88.5 89.5
lon
(lon)
float64
0.625 1.875 3.125 ... 358.1 359.4
time
(time)
datetime64[ns]
2007-01-01T00:30:00 ... 2012-12-...
pfull_sub03
(pfull_sub03)
float64
229.3 277.4 333.2 ... 992.8 997.9
phalf
(phalf)
float64
0.01 0.02697 ... 995.9 1e+03
Data variables:
H2
(time, pfull_sub03, lat, lon)
float32
dask.array<chunksize=(8760, 21, 1, 2), meta=np.ndarray>
CO
(time, pfull_sub03, lat, lon)
float32
dask.array<chunksize=(8760, 21, 1, 2), meta=np.ndarray>
ps
(time, lat, lon)
float32
dask.array<chunksize=(8760, 3, 23), meta=np.ndarray>
temp
(time, pfull_sub03, lat, lon)
float32
dask.array<chunksize=(8760, 21, 1, 2), meta=np.ndarray>
sphum
(time, pfull_sub03, lat, lon)
float32
dask.array<chunksize=(8760, 21, 1, 2), meta=np.ndarray>
ucomp
(time, pfull_sub03, lat, lon)
float32
dask.array<chunksize=(8760, 21, 1, 2), meta=np.ndarray>
vcomp
(time, pfull_sub03, lat, lon)
float32
dask.array<chunksize=(8760, 21, 1, 2), meta=np.ndarray>
u_ref
(time, lat, lon)
float32
dask.array<chunksize=(8760, 3, 23), meta=np.ndarray>
v_ref
(time, lat, lon)
float32
dask.array<chunksize=(8760, 3, 23), meta=np.ndarray>
zsurf
(lat, lon)
float32
...
pk
(phalf)
float32
...
bk
(phalf)
float32
...
I then select the observations using
obs_select = f1.sel(lat=target_lat,lon=target_lon,time=target_time,method='nearest')
where target_lat looks like
xarray.DataArray
'lat'
points: 1003
array([61.854 , 63.436 , 65.8994, ..., 62.26 , 60.96 , 59.35 ])
Coordinates:
points
(points)
int64
0 1 2 3 4 ... 999 1000 1001 1002
Attributes: (0)
This gives me
obs_select
xarray.Dataset
Dimensions:
points: 1003pfull_sub03: 21phalf: 50
Coordinates:
lat
(points)
float64
61.5 63.5 65.5 ... 62.5 60.5 59.5
lon
(points)
float64
205.6 204.4 203.1 ... 203.1 205.6
time
(points)
datetime64[ns]
2009-10-27T17:30:00 ... 2012-10-...
pfull_sub03
(pfull_sub03)
float64
229.3 277.4 333.2 ... 992.8 997.9
phalf
(phalf)
float64
0.01 0.02697 ... 995.9 1e+03
points
(points)
int64
0 1 2 3 4 ... 999 1000 1001 1002
Data variables:
H2
(points, pfull_sub03)
float32
dask.array<chunksize=(1003, 21), meta=np.ndarray>
CO
(points, pfull_sub03)
float32
dask.array<chunksize=(1003, 21), meta=np.ndarray>
ps
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
temp
(points, pfull_sub03)
float32
dask.array<chunksize=(1003, 21), meta=np.ndarray>
sphum
(points, pfull_sub03)
float32
dask.array<chunksize=(1003, 21), meta=np.ndarray>
ucomp
(points, pfull_sub03)
float32
dask.array<chunksize=(1003, 21), meta=np.ndarray>
vcomp
(points, pfull_sub03)
float32
dask.array<chunksize=(1003, 21), meta=np.ndarray>
u_ref
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
v_ref
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
zsurf
(points)
float32
746.8 263.4 194.8 ... 240.2 262.1
pk
(phalf)
float32
1.0 2.697 5.171 ... 225.0 10.0 0.0
bk
(phalf)
float32
0.0 0.0 0.0 ... 0.9874 0.9958 1.0
I further select the altitude range using
OF = obsf.where(np.logical_and(obsf[‘alt_bottom’]<=target_alt,obsf[‘alt_top’]>target_alt)).mean(dim=‘pfull’)
This yields
OF =
xarray.Dataset
Dimensions:
points: 1003
phalf: 22
Coordinates:
lat
(points)
float64
61.5 63.5 65.5 ... 62.5 60.5 59.5
lon
(points)
float64
205.6 204.4 203.1 ... 203.1 205.6
time
(points)
datetime64[ns]
2009-10-27T17:30:00 ... 2012-10-...
phalf
(phalf)
float64
207.8 252.2 304.1 ... 995.9 1e+03
points
(points)
int64
0 1 2 3 4 ... 999 1000 1001 1002
Data variables:
H2
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
CO
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
ps
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
temp
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
sphum
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
ucomp
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
vcomp
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
u_ref
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
v_ref
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
zsurf
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
pk
(phalf, points)
float32
dask.array<chunksize=(22, 1003), meta=np.ndarray>
bk
(phalf, points)
float32
dask.array<chunksize=(22, 1003), meta=np.ndarray>
alt_center
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
alt_edge
(points, phalf)
float32
dask.array<chunksize=(1003, 21), meta=np.ndarray>
alt_top
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
alt_bottom
(points)
float32
dask.array<chunksize=(1003,), meta=np.ndarray>
Attributes: (0)
But OF.compute() never completes.
I tried to first sample alongside the time dimension (200 unique timesteps) but it’s also extremely slow.
I think I am probably doing something wrong with dask. I am running a local cluster
client = Client(n_workers=4, threads_per_worker=1, memory_limit='50GB',local_directory=os.environ['TMPDIR'])
Thanks a lot for your help