Favorite way to go from netCDF (&xarray) to torch/TF/Jax et al

weiji14 · August 17, 2022, 1:14pm

I love how this discussion is steering more from metadata to speed Just to clarify, NetCDF can indeed be fast enough if you’re going from File → CPU-RAM → GPU-RAM (assuming you’ve got enough I/O, RAM, etc) as @rabernat pointed out.

Now, if you want end-to-end pure speed (and have enough GPU RAM) then the CPU-RAM to GPU-RAM data transfer will be the main bottleneck. You’ll then need to look at things like GPU direct storage:

This would involve using libraries that handle loading/pre-processing directly on the GPU like:

RAPIDS, in particular, CUCIM - GitHub - rapidsai/cucim for computer vision/image processing
NVIDIA DALI, which does data loading and augmentation on the GPU, see NVIDIA DALI Documentation — NVIDIA DALI 1.16.0 documentation

Caveat with this is that you can’t read NetCDFs or most ‘geo’ formats directly into GPU yet (as far as I’m aware). Relevant issues include:

That said, there is a way to map CPU/NumPy tensors to GPU/CuPy tensors in xarray as with cupy-xarray, and then use GPU zero-copy methods to convert CuPy tensors to Pytorch/Tensorflow tensors. See:

But again, you will still need to load the NetCDF from File → CPU-RAM → GPU-RAM until someone figures out a more direct NetCDF file → GPU-RAM path. This has been on my wishlist for quite a while, and most of the interoperability standards are in place, we just need to get some smart people to do it

Topic		Replies	Views
Blog post: Loading NetCDFs in TensorFlow Data	2	763	March 21, 2022
Processing large (too large for memory) xarray datasets, and writing to netcdf Science	12	7205	December 12, 2024
Create batches of random subsets of data scattered across different files Data machine-learning	2	89	January 29, 2025
Memory requirements tor converting a netcdf multifile dataset to zarr Data	3	834	May 18, 2022
Using grib2 files with `open_mfdataset`: is there a better workflow than converting to netcdf?	4	1368	July 27, 2022

Favorite way to go from netCDF (&xarray) to torch/TF/Jax et al

Related topics