I love how this discussion is steering more from metadata to speed Just to clarify, NetCDF can indeed be fast enough if you’re going from File → CPU-RAM → GPU-RAM (assuming you’ve got enough I/O, RAM, etc) as @rabernat pointed out.
Now, if you want end-to-end pure speed (and have enough GPU RAM) then the CPU-RAM to GPU-RAM data transfer will be the main bottleneck. You’ll then need to look at things like GPU direct storage:
- GPUDirect Storage: A Direct Path Between Storage and GPU Memory | NVIDIA Technical Blog
- NVIDIA GPUDirect Storage Documentation
This would involve using libraries that handle loading/pre-processing directly on the GPU like:
- RAPIDS, in particular, CUCIM - GitHub - rapidsai/cucim for computer vision/image processing
- NVIDIA DALI, which does data loading and augmentation on the GPU, see NVIDIA DALI Documentation — NVIDIA DALI 1.16.0 documentation
Caveat with this is that you can’t read NetCDFs or most ‘geo’ formats directly into GPU yet (as far as I’m aware). Relevant issues include:
- [FEA] Support Zarr-based image format (such as NGFF) · Issue #94 · rapidsai/cucim · GitHub
- Pythonic TIFF Reader · Issue #1272 · NVIDIA/DALI · GitHub
- Please expose __cuda_array_interface__ via the xarray.__array__() function if present · Issue #6847 · pydata/xarray · GitHub
That said, there is a way to map CPU/NumPy tensors to GPU/CuPy tensors in xarray as with cupy-xarray
, and then use GPU zero-copy methods to convert CuPy tensors to Pytorch/Tensorflow tensors. See:
- https://developer.nvidia.com/blog/machine-learning-frameworks-interoperability-part-1-memory-layouts-and-memory-pools/
- https://developer.nvidia.com/blog/machine-learning-frameworks-interoperability-part-2-data-loading-and-data-transfer-bottlenecks/
But again, you will still need to load the NetCDF from File → CPU-RAM → GPU-RAM until someone figures out a more direct NetCDF file → GPU-RAM path. This has been on my wishlist for quite a while, and most of the interoperability standards are in place, we just need to get some smart people to do it