I am currently downloading a dataset from Zenodo (10’s of GB) and it is taking a long time (hour or so). Is it common? Any ideas on how to speedup?
Hi, Since last couple of weeks i am facing same problem… Even sometimes its happens that out of 26GB data 25GB downloaded then Zenodo server down, i was waited for more than 30hrs to complete the download.
Very Pathetic but it’s cool that at least we are getting data from there…
Just jumping in to say I’ve been having the same trouble. I don’t know that I’ve seen it this bad before.
It would be cool if they had a status page if this kind of thing becomes more frequent.
I note that they do sometimes share status on the Zenodo blog, e.g. Zenodo upgrade issues, 2023-10-19
I’m seeing slow speeds myself now, e.g. 200 kB/s, despite speedtest.net results of 80 Mbps and more…
If anyone else struggles with the slow download speed in zenodo, try using this command line tool
GitHub - dvolgyes/zenodo_get: Zenodo_get: Downloader for Zenodo records. I was able to download ~2GB dataset within seconds
Hi,
The xcube plugin xcube-zenodo may help.
- It offers lazy access of chunked datasets published as tif or netcdf on Zenodo. see example notebook.
- If the dataset is published in a compressed format (zip. tar, tar.gz), then the full dataset needs to be downloaded first. This can be done via the xcube’s preload API. See other example notebooks in the same repository.
I’ve also noticed on small machines in AWS it’s common to have <1MB/s transfer speeds for a big 10GB+ zenodo.zip
In Germany on home wifi it’s a bit better ~ 6 MB/s, which I assume is because the Zenodo data centers are physically in Switzerland (Infrastructure | Zenodo)
oh this was interesting for me, everything is now a store, including a website with a directory containing a tif?? Isn’t that seen as pretty fragile, since a simple string is obscured by required code in only one language?
import rioxarray
rioxarray.open_rasterio("/vsicurl/https://zenodo.org/records/8154445/files/planet_canopy_cover_30m_v0.1.tif?download=1")
<xarray.DataArray (band: 1, y: 149363, x: 170397)> Size: 25GB
[25451007111 values with dtype=uint8]
Coordinates:
* band (band) int64 8B 1
* x (x) float64 1MB 2.555e+06 2.555e+06 ... 7.667e+06 7.667e+06
* y (y) float64 1MB 5.82e+06 5.82e+06 ... 1.339e+06 1.339e+06
spatial_ref int64 8B 0
Attributes:
AREA_OR_POINT: Area
_FillValue: 255
scale_factor: 1.0
add_offset: 0.0
I’m not particularly heavy user of rioxarray, but this is a lazy access to a chunked dataset, and would work as well in a zip, tar, or targz (variously add the /vsizip/ /vsitar/ /vsigzip protocols). With osgeo.gdal.OpenEx and OF_MULTIDIM_RASTER also netcdf and zarr can be accessed in multidimensional mode). You really don’t have to download compressed files.
And without using another library (no offence intended, but I try to keep my stack as minimal as possible).
what library are you using to download?
I used zenodo_get, but the call you presented seems way more straightforward.
The only thing that can be difficult (perhaps) is downloading from Zenodo with authentication for closed repositories.