Dear @jhamman ,
thanks for your reply.
This is how my dataset looks.
<bound method Dataset.__repr__ of <xarray.Dataset>
Dimensions: (dt_calc: 1, dt_fore: 112, latitude: 1441,
longitude: 2879)
Coordinates:
* dt_calc (dt_calc) datetime64[ns] 2022-07-25
* dt_fore (dt_fore) float64 0.0 1.0 2.0 ... 174.0 177.0
lat (latitude, longitude) float64 dask.array<chunksize=(181, 720), meta=np.ndarray>
* latitude (latitude) float64 -90.0 -89.88 ... 89.88 90.0
lon (latitude, longitude) float64 dask.array<chunksize=(181, 720), meta=np.ndarray>
* longitude (longitude) float64 -180.0 -179.9 ... 179.8
Data variables:
air_temperature_2m (dt_calc, dt_fore, latitude, longitude) float16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
dewpoint_2m (dt_calc, dt_fore, latitude, longitude) float16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
global_horizontal_irradiance (dt_calc, dt_fore, latitude, longitude) int16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
max_wind_gust_10m (dt_calc, dt_fore, latitude, longitude) int16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
relative_humidity_2m (dt_calc, dt_fore, latitude, longitude) int16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
total_cloud_cover (dt_calc, dt_fore, latitude, longitude) int16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
total_precipitation (dt_calc, dt_fore, latitude, longitude) float16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
weather_synop_code (dt_calc, dt_fore, latitude, longitude) int16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
wind_direction_10 (dt_calc, dt_fore, latitude, longitude) int16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>
wind_speed_10 (dt_calc, dt_fore, latitude, longitude) int16 dask.array<chunksize=(1, 14, 181, 360), meta=np.ndarray>>
the step ds.isel() is the most time consuming part of the code.
Reading the metadata and attributes takes ~300ms, The rest approx.2,5 seconds.
I have just looked here for async support : https://s3fs.readthedocs.io/en/latest/. But as I said I receive loop is missing in MainThread.
This dataset AWS S3 Explorer comes very close to the one I am using. Unfortunately the Dataset is empty for the overlaying directory
bucket_name='noaa-hrrr-bdp-pds'
dataset_name ='sfc/20200801/20200801_00z_anl'
I hope this helps to understand my issue. Thanks a lot
Daniel