Hello all,
I would like to ask a question to the community of DASK people, about whether it is worth trying to optimize the computation time of a function applied on a single file. And what I mean by this is that I have the following temperature file, which I tried to open lazily with dask and chunks (just a trial):
chunks = {'time':1,'j':30,'i':20}
MLTF = xr.open_dataset('/home/REGIONS/TEMP_2001_2015_2D.nc',chunks=chunks)
MLT = MLTF.MLT;lon=MLTF.i;lat=MLTF.j;time=MLTF.time
where the MLT (temperature file) has the following characteristics:
MLT
Out[3]:
<xarray.DataArray 'MLT' (time: 5413, j: 131, i: 101)>
dask.array<open_dataset-aa8def32f2e58a7766e4a76da077651fMLT, shape=(5413, 131, 101), dtype=float32, chunksize=(1, 30, 20), chunktype=numpy.ndarray>
Coordinates:
* i (i) int64 1179 1180 1181 1182 1183 ... 1275 1276 1277 1278 1279
* j (j) int64 569 570 571 572 573 574 575 ... 694 695 696 697 698 699
* time (time) datetime64[ns] 2001-02-01 2001-02-02 ... 2015-11-30
Attributes:
standard_name: MLT
long_name: User-Defined Mixed Layer Temperature
units: degC
I need to apply a function to it from the following github source, made for xarrays: xmhw/xmhw_demo.ipynb at master · coecms/xmhw · GitHub
Therefore I started by trying to implement dask delayed.
x = delayed(threshold(MLT,climatologyPeriod=[2001,2015],tdim='time'))
However this computation, for some reason, took a lot of time (it never finished actually and I had to stop it in the end) and therefore I would like to ask whether a 3D file like this MLT(time: 5413, j: 131, i: 101) has a chance to be optimized in any way, when applying any of the functions mentioned in this github repository: (xmhw/xmhw_demo.ipynb at master · coecms/xmhw · GitHub). Or if the file is too big and the function too complicated to run fast.
Thank you in advance for your time and help,
kind regards,
Sofi