Decode GeoTIFF to GPU memory

Here’s the timings @Michael_Sumner :laughing:

Standard GTiff driver (GDAL 3.10.3)

from osgeo import gdal
import time
import os

gdal.UseExceptions()
os.environ["GDAL_DISABLE_READDIR_ON_OPEN"] = "EMPTY_DIR"

# %%
%%timeit
t0 = time.perf_counter()
ds = gdal.Open("benches/TCI.tif")
d = ds.ReadRaster()
t1 = time.perf_counter()
print(t1 - t0)
# 1.29 s ± 37.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1.29s (Python) - 1.05s (Rust) means about 0.25s of extra overhead from Python.

LiberTIFF driver (GDAL 3.11.0)

# %%
%%timeit
t0 = time.perf_counter()
ds = gdal.OpenEx("benches/TCI.tif", allowed_drivers = ["LIBERTIFF"], open_options = ["NUM_THREADS=16"])
d = ds.ReadRaster()
t1 = time.perf_counter()
print(t1 - t0)
# 192 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

0.2s (LiberTIFF) - 0.35s (nvTIFF) is about 0.15s faster! Guess I’ve got some work to do (still need to benchmark true CPU → GPU timings). My guess is that for small COGs, GDAL+GTiff/LiberTIFF might be performant enough, but larger COGs could benefit from nvTIFF’s GPU-based decoding. But I’ll run the numbers to verify.

Edit: I will note though, as mentioned in the blog post, that multi-threaded GDAL+LiberTIFF will clash with Pytorch multiprocessing, so there’s still value in off-loading decoding to GPU instead of staying on the CPU. Single-threaded LiberTIFF takes 879 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) on my laptop, 0.88s (LiberTIFF 1 thread) - 0.35s (nvTIFF) = 0.53s gap.

That’s what I’ve been wondering for years since this post (whether we can use kerchunk-at-that-time, VirtualiZarr now, to do direct-to-GPU reads).

My understanding is that we would need zarr-python/VirtualiZarr to support these GPU-native libs:

CPU GPU
TIFF metadata/IFD decoding async-tiff (Rust) + virtual-tiff (Python) ?
Decompression numcodecs (Python/Cython) nvCOMP (C++)

The ? is the key part. I’m proposing that nvTIFF is the more direct way of read COGs to the GPU. The Virtualizarr-way would go through kvikio.zarr.GDSStore and if it works, that could be faster in theory since it’s using cuFile. Sadly, nvTIFF doesn’t actually use cuFile yet, but I think it’s only a matter of time.

My hot take is that reading L2 GeoTIFF data to the GPU shouldn’t need to rely on Zarr or wait for the GeoZarr spec. Also, virtualizarr is Python-only for now, and I do think we should be building something that is cross-language compatible, which GDAL+LiberTIFF is doing for CPU workflows, and I’m hoping that Rust-bindings to nvTIFF will play that role for GPU workflows.

1 Like