Can a reprojection/change of CRS operation be done lazily using rioXarray?

interesting! I definitely miss multi-resolution in Zarr, but also Zarr is not so well supported outside of Python. It’s fine obviously if you can tool folks up with a Python stack, but we support other users too. I’m not entirely into treating a set of files of daily data as an array with degenerate rectlinear coords when a list of files with a date stamp is entirely fine and works fast cross language using analogous idioms, and we have existing workflows for visualization, extraction, aggregation, etc. We’re also trying to support those workflows, things we already do well. Getting away from NetCDF is a great first step, but Zarr doesn’t have the generic appeal that COGs do when we’re already heavily invested in GDAL as a foundation.

Do you think multi-resolution Zarr is going to catch on as a standard?

It’s funny that you characterize CoGs as more generic than Zarr. :thinking: CoG is explicitly and narrowly defined as a format for imagery. What about data that isn’t well modeled as imagery, like climate model outputs? What about timeseries analysis? What about more flexible chunking schemes for high-dimensional data? Zarr is much more flexible and generic than CoG as a general-purpose data container. This is also why it’s harder to make it “just work” automatically in every situation.

Just stacking up CoGs in time is not, in my opinion, the optimal solution for cloud native data cube analytics. Yes, CoGs have better support across the GIS stack. But Zarr also works in many languages.

Zarr is absolutely already catching on as a standard. Many of the most innovative groups are already using it heavily in production.

This community (Pangeo) is quite invested in Zarr and will continue to advance the format and its interoperability across the ecosystem.

2 Likes

I agree about the genericity. I did not characterize COGs are more generic than Zarr.

I used “generic” really specifically: “generic appeal” - which is from our perspective i.e. to belabor my perhaps too short sentence: “doesn’t have the generic appeal to us” I’m sorry that wasn’t clear enough. I think I say too much most of the time.

2 Likes

Hey @Michael_Sumner,

There are a few JS Zarr implementations (zarr.js and zarrita.js). At Carbonplan we use multi-resolution Zarr in web mapping.

xcube has an implementation of multi-resolution zarr and we have a very bespoke multi-resolution implementation.

AFAIK, there is no geospatial multi scale spec, but a ZEP could be really interesting. I wonder if it could be an add-on to the geozarr work.

Hey @gunbra32, thanks again for sharing xcube! It looks super cool.

I’ve been struggling to get resample_in_space to work. Would you have time and be willing to take a quick look at warp-resample-profiling/examples/future-resample-xcube-h5netcdf-.ipynb at main · developmentseed/warp-resample-profiling · GitHub to see if there’s something obviously wrong? I used GridMapping rather than encoding crs as a variable - are both necessary? The workflow shows a few dask runtime errors and the kernel dies with OOM on a 60 GB RAM instance after ~6 minutes.

Hi, many thanks for trying. We are always happy for real world use cases and feedback and would thus love to look into your example in more, but this will need some days. From looking at your example, I would start with using open_dataset() with the ‘chunks’ argument (default is ‘None’, which skips using dask) but size and structure of the input file is not clear from the NB.

1 Like

I can certainly not predict the future, but we definitely see in the Earth Observation community that zarr gains considerable traction beyond the Python data science community.
It may have gone unnoticed here but the future, nominal file format for the Sentinel products will be zarr. Product specification and sample products can be found here. Likewise, SNAP, the standard software for exploitation of EO products from ESA, uses zarr as the standard file format since v9.0.0.

2 Likes

I agree! I never said it wouldn’t. What I said was that we have workflows that are better suited for now and the forseeable future without reformatting to actual or virtualized Zarr. i don’t understand why that is being reframed. Consider to review what I said, vs what Ryan said I said. (please)

Degenerate rectilinear coords is a showstopper currently and there are other problems like actual technical functional availability in languages that aren’t Python or javascript or C++ or Rust. I want those to be solved and I’m exploring how they might be.

1 Like

Thanks all for the really intersting discussions here. I’m under the impression by reading at all this that there might be two worlds (again) that might not have the same needs and so the same tools. Raster imagery on one side (originaly Tiff or alike), and climate models or data on the other side (e.g. NetCDF like).

Maybe @maxrjones it would be really nice to caracterize the tools by the initial target in your diagram?

Can these worlds be unified? Through Zarr v3 and GeoZarr? And what are the advances in GeoZarr specs?