Favorite way to go from netCDF (&xarray) to torch/TF/Jax et al

I love how this discussion is steering more from metadata to speed :rofl: Just to clarify, NetCDF can indeed be fast enough if you’re going from File → CPU-RAM → GPU-RAM (assuming you’ve got enough I/O, RAM, etc) as @rabernat pointed out.

Now, if you want end-to-end pure speed (and have enough GPU RAM) then the CPU-RAM to GPU-RAM data transfer will be the main bottleneck. You’ll then need to look at things like GPU direct storage:

image

This would involve using libraries that handle loading/pre-processing directly on the GPU like:

Caveat with this is that you can’t read NetCDFs or most ‘geo’ formats directly into GPU yet (as far as I’m aware). Relevant issues include:

That said, there is a way to map CPU/NumPy tensors to GPU/CuPy tensors in xarray as with cupy-xarray, and then use GPU zero-copy methods to convert CuPy tensors to Pytorch/Tensorflow tensors. See:

But again, you will still need to load the NetCDF from File → CPU-RAM → GPU-RAM until someone figures out a more direct NetCDF file → GPU-RAM path. This has been on my wishlist for quite a while, and most of the interoperability standards are in place, we just need to get some smart people to do it :slightly_smiling_face:

2 Likes