Can a reprojection/change of CRS operation be done lazily using rioXarray?

Michael_Sumner · September 23, 2024, 9:10pm

interesting! I definitely miss multi-resolution in Zarr, but also Zarr is not so well supported outside of Python. It’s fine obviously if you can tool folks up with a Python stack, but we support other users too. I’m not entirely into treating a set of files of daily data as an array with degenerate rectlinear coords when a list of files with a date stamp is entirely fine and works fast cross language using analogous idioms, and we have existing workflows for visualization, extraction, aggregation, etc. We’re also trying to support those workflows, things we already do well. Getting away from NetCDF is a great first step, but Zarr doesn’t have the generic appeal that COGs do when we’re already heavily invested in GDAL as a foundation.

Do you think multi-resolution Zarr is going to catch on as a standard?

rabernat · September 24, 2024, 2:03pm

It’s funny that you characterize CoGs as more generic than Zarr. CoG is explicitly and narrowly defined as a format for imagery. What about data that isn’t well modeled as imagery, like climate model outputs? What about timeseries analysis? What about more flexible chunking schemes for high-dimensional data? Zarr is much more flexible and generic than CoG as a general-purpose data container. This is also why it’s harder to make it “just work” automatically in every situation.

Just stacking up CoGs in time is not, in my opinion, the optimal solution for cloud native data cube analytics. Yes, CoGs have better support across the GIS stack. But Zarr also works in many languages.

Zarr is absolutely already catching on as a standard. Many of the most innovative groups are already using it heavily in production.

This community (Pangeo) is quite invested in Zarr and will continue to advance the format and its interoperability across the ecosystem.

Michael_Sumner · September 24, 2024, 3:10pm

I agree about the genericity. I did not characterize COGs are more generic than Zarr.

I used “generic” really specifically: “generic appeal” - which is from our perspective i.e. to belabor my perhaps too short sentence: “doesn’t have the generic appeal to us” I’m sorry that wasn’t clear enough. I think I say too much most of the time.

norlandrhagen · September 24, 2024, 6:00pm

Michael_Sumner:

interesting! I definitely miss multi-resolution in Zarr, but also Zarr is not so well supported outside of Python. It’s fine obviously if you can tool folks up with a Python stack, but we support other users too. I’m not entirely into treating a set of files of daily data as an array with degenerate rectlinear coords when a list of files with a date stamp is entirely fine and works fast cross language using analogous idioms, and we have existing workflows for visualization, extraction, aggregation, etc. We’re also trying to support those workflows, things we already do well. Getting away from NetCDF is a great first step, but Zarr doesn’t have the generic appeal that COGs do when we’re already heavily invested in GDAL as a foundation.

Do you think multi-resolution Zarr is going to catch on as a standard?

Hey @Michael_Sumner,

There are a few JS Zarr implementations (zarr.js and zarrita.js). At Carbonplan we use multi-resolution Zarr in web mapping.

xcube has an implementation of multi-resolution zarr and we have a very bespoke multi-resolution implementation.

AFAIK, there is no geospatial multi scale spec, but a ZEP could be really interesting. I wonder if it could be an add-on to the geozarr work.

maxrjones · September 25, 2024, 12:05am

Hey @gunbra32, thanks again for sharing xcube! It looks super cool.

I’ve been struggling to get resample_in_space to work. Would you have time and be willing to take a quick look at warp-resample-profiling/examples/future-resample-xcube-h5netcdf-.ipynb at main · developmentseed/warp-resample-profiling · GitHub to see if there’s something obviously wrong? I used GridMapping rather than encoding crs as a variable - are both necessary? The workflow shows a few dask runtime errors and the kernel dies with OOM on a 60 GB RAM instance after ~6 minutes.

gunbra32 · September 26, 2024, 1:57pm

Hi, many thanks for trying. We are always happy for real world use cases and feedback and would thus love to look into your example in more, but this will need some days. From looking at your example, I would start with using open_dataset() with the ‘chunks’ argument (default is ‘None’, which skips using dask) but size and structure of the input file is not clear from the NB.

gunbra32 · September 26, 2024, 2:14pm

I can certainly not predict the future, but we definitely see in the Earth Observation community that zarr gains considerable traction beyond the Python data science community.
It may have gone unnoticed here but the future, nominal file format for the Sentinel products will be zarr. Product specification and sample products can be found here. Likewise, SNAP, the standard software for exploitation of EO products from ESA, uses zarr as the standard file format since v9.0.0.

Michael_Sumner · September 26, 2024, 3:09pm

I agree! I never said it wouldn’t. What I said was that we have workflows that are better suited for now and the forseeable future without reformatting to actual or virtualized Zarr. i don’t understand why that is being reframed. Consider to review what I said, vs what Ryan said I said. (please)

Degenerate rectilinear coords is a showstopper currently and there are other problems like actual technical functional availability in languages that aren’t Python or javascript or C++ or Rust. I want those to be solved and I’m exploring how they might be.

geynard · October 1, 2024, 6:40am

Thanks all for the really intersting discussions here. I’m under the impression by reading at all this that there might be two worlds (again) that might not have the same needs and so the same tools. Raster imagery on one side (originaly Tiff or alike), and climate models or data on the other side (e.g. NetCDF like).

Maybe @maxrjones it would be really nice to caracterize the tools by the initial target in your diagram?

Can these worlds be unified? Through Zarr v3 and GeoZarr? And what are the advances in GeoZarr specs?

Ryan_Avery · October 20, 2024, 7:34pm

I think they can be unified. From what I can tell by reading these threads and the github, GeoZarr is not set in stone but I think we can figure it out. This is a good thread on some of the challenges

Showing some prototype implementations that can be iterated on and discussed seems like the current next step.

lazily reading in many raster images into the same Xarray object without materializing coordinates. The feature in Xarray to enable this has been drafted here: Flexible coordinate transform by benbovy · Pull Request #9543 · pydata/xarray · GitHub
roundtripping the CRS data after writing the GeoZarr and reading back to the original Xarray object seems like the next step. This round tripping demonstration has been discussed in a few places or even implemented but not in a generic way within xarray or it’s extensions.

This approach for ironing out the spec has been brought up in a couple threads and I think it’s a good one. It seems like what’s a limiting factor is people power to provide implementations that can be discussed and iterated on.

I saw that November 6th there is a meeting to discuss the GeoZarr spec: GeoZarr Spec Steering Working Group - HackMD. I’ll attend, hope to see many others! I’d like to contribute to protoyping GeoZarr. As a first step I am working with a very sparse 2Gb dataset of Sentinel-2 raster chips across Europe, Eurosat, and trying to make a prototype that addresses points 1. and 2. above.

konstntokas · November 7, 2024, 9:57am

Hi there,

We’ve added a new large-scale notebook example to the xcube repository! This example demonstrates how to use the resample_in_space() method to reproject the ESA CCI Land Cover classification dataset for all of Europe.

The dataset is stored in an AWS S3 bucket in xcube’s multi-resolution Zarr format and is accessed as chunked xarray.Datasets which allows to perform the entire operation lazy using Dask. Check it out and let us know what you think!

Topic		Replies	Views
Example which highlights the limitations of NetCDF-style coordinates for large geospatial rasters	46	2466	July 2, 2025
Resampling Zarr-backed Xarray wth VRTs and GDAL support for Zarr	8	889	November 11, 2023
Open Planetary Engine :globe_showing_europe_africa: Open Science	7	329	May 26, 2025
Xarray for raster data (DEMs) with inconsistent spatial extent Data	10	3088	January 6, 2024
Netcdf to Zarr best practices Data	15	11081	September 16, 2025

Can a reprojection/change of CRS operation be done lazily using rioXarray?

Related topics