Decode GeoTIFF to GPU memory

weiji14 · June 24, 2025, 9:20pm

Standard GTiff driver (GDAL 3.10.3)

from osgeo import gdal
import time
import os

gdal.UseExceptions()
os.environ["GDAL_DISABLE_READDIR_ON_OPEN"] = "EMPTY_DIR"

# %%
%%timeit
t0 = time.perf_counter()
ds = gdal.Open("benches/TCI.tif")
d = ds.ReadRaster()
t1 = time.perf_counter()
print(t1 - t0)
# 1.29 s ± 37.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1.29s (Python) - 1.05s (Rust) means about 0.25s of extra overhead from Python.

LiberTIFF driver (GDAL 3.11.0)

# %%
%%timeit
t0 = time.perf_counter()
ds = gdal.OpenEx("benches/TCI.tif", allowed_drivers = ["LIBERTIFF"], open_options = ["NUM_THREADS=16"])
d = ds.ReadRaster()
t1 = time.perf_counter()
print(t1 - t0)
# 192 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

0.2s (LiberTIFF) - 0.35s (nvTIFF) is about 0.15s faster! Guess I’ve got some work to do (still need to benchmark true CPU → GPU timings). My guess is that for small COGs, GDAL+GTiff/LiberTIFF might be performant enough, but larger COGs could benefit from nvTIFF’s GPU-based decoding. But I’ll run the numbers to verify.

Edit: I will note though, as mentioned in the blog post, that multi-threaded GDAL+LiberTIFF will clash with Pytorch multiprocessing, so there’s still value in off-loading decoding to GPU instead of staying on the CPU. Single-threaded LiberTIFF takes 879 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) on my laptop, 0.88s (LiberTIFF 1 thread) - 0.35s (nvTIFF) = 0.53s gap.

That’s what I’ve been wondering for years since this post (whether we can use kerchunk-at-that-time, VirtualiZarr now, to do direct-to-GPU reads).

My understanding is that we would need zarr-python/VirtualiZarr to support these GPU-native libs:

	CPU	GPU
TIFF metadata/IFD decoding	`async-tiff` (Rust) + `virtual-tiff` (Python)	?
Decompression	`numcodecs` (Python/Cython)	`nvCOMP` (C++)

The ? is the key part. I’m proposing that nvTIFF is the more direct way of read COGs to the GPU. The Virtualizarr-way would go through kvikio.zarr.GDSStore and if it works, that could be faster in theory since it’s using cuFile. Sadly, nvTIFF doesn’t actually use cuFile yet, but I think it’s only a matter of time.

My hot take is that reading L2 GeoTIFF data to the GPU shouldn’t need to rely on Zarr or wait for the GeoZarr spec. Also, virtualizarr is Python-only for now, and I do think we should be building something that is cross-language compatible, which GDAL+LiberTIFF is doing for CPU workflows, and I’m hoping that Rust-bindings to nvTIFF will play that role for GPU workflows.

Topic		Replies	Views
Cloud-optimized access to Sentinel-2 JPEG2000 Data	10	373	May 2, 2025
Reading HDF-EOS (HDF4) files in parallel: GDAL/rasterio/etc Science	6	865	April 15, 2024
How do you store multi-dimensional arrays in a single tile in a COG?	17	238	May 30, 2025
Read multiple tiff image using zarr Data	1	1383	January 20, 2021
What's the best file format to chose for raster imagery and masks products Data	23	1633	May 18, 2025

Decode GeoTIFF to GPU memory

Standard GTiff driver (GDAL 3.10.3)

LiberTIFF driver (GDAL 3.11.0)

Related topics