Copernicus Sentiel 2 overlapping coordinates between tiles

A little bit of Context:

ESA is distributing [Sentinel 2]( Sentinel-2 - Missions - Sentinel Online - Sentinel (esa.int)) A/B at level-1C (Top Of Atmosphere) and L2A (Bottom Of Atmosphere) through tiles of fixed size. These cover an area of 100x100 km and are orthorectified-images in MGRS/WGS84 reference system.

For historical reasons, each tile has an overlapping area with around tiles. Unfortunately, overlaps are not fixed in dimension and vary according to the tile position within the reference grid. Within the metadata, there isn’t any information about the surrounding tiles.

The classic approach:
As most of the time, a single tile doesn’t cover the area of interest, mosaicking of the tiles is needed. Different approaches can be taken on how overlapping values are treated. Min, max, average, first, last are the most common according to the needs but much more complicate algorithms are available for specific cases.
In Rasterio, this can be easily obtained through the .merge module. To have a quick example of how ti is used, have a look here

Xarray:
From my understanding seems that there is no way to achieve the merging for one specific reason. Combine can’t be used as coordinates are not monotonic. Moreover, there isn’t an option to manage the overlapping pixels, but that would be the second step once the coordinate grid is determined. Here a discussion on this topic [Xarray combine_by_coords return the monotonic global index error]( Xarray combine_by_coords return the monotonic global index error · Issue #4213 · pydata/xarray (github.com))

AWS:
For those interested in why this problem seems to be not present in the AWS, what I can say is that it seems that Sinergise, for the AWS service, has converted the files from jpeg2000 to tif, making some adjustments. Unfortunatelly I don’t have direct access to the AWS and can’t check by my self, but picking some info here and there seems that they used the average algorithm. This specific approach could be considered acceptable for the L1C but not for the L2A. But this is a completely different topic, and I would not investigate more for the moment.

Question:
Does anyone has stumbled with this problem? How did you solve it? It could be that I’m not aware of the specific module that can solve this. I’ve searched a little bit, but nothing has popped up.
Any idea about how I could organise a workaround in a ‘lazy dask’ approach?
It would be feasible to think of implementation in Xarray of a system to treat this specific problem.

1 Like

Good question @PhenoloBoy!

@JessicaS11, @scottyhq and @TomAugspurger, and others have been working on developing some best practices for working with geospatial raster data in Pangeo. This is discussed in the following threads

Perhaps they will have some suggestions.

hello, I agree with @rabernat - good question!

I think this is a super hard problem to solve, and tbh, I don’t think it is solvable at the level of the products given out by ESA.

Think of this; ESA hands out tiles - squares based on MGRS - and in that it makes a simplification. The assumption is that, given that dimension of a tile, the curvature of Earth is not affecting the area representation. What you are trying to do is to stitch together multiples of those squares, but suddenly, when you start doing this the dimensions grow, and curvature does start to play a role. The southern or northern you work, the more prevalent that is.

I don’t think there is a good way to stitch together L1C or L2A tiles on a 2D space; at least there doesn’t seem to be one because it would not result in a useful representation (or visualization even).

(I will also leave this here which is not answering your question, but it is interesting:)

Having said this, I am assuming that you are trying to solve a different problem, and this is a step in the process to solve it. What is it that you are really trying to do?
In most cases, you would be processing the image to understand features and deduct something about the given area. In most cases you want to avoid processing the same area (part of tile) twice, both due to performance reasons, but also due to the bias of evaluating such parts multiple times.
Instead of thinking of a big surface, composed of square images, try to find for each pair of neighbouring tiles those common pixels. Then think of processed and unprocessed areas/pixels, guided by this information.

1 Like

Rereading your original post, it sounds like a good solution already exists. Can you help us understand better why the rasterio solution is not enough for you?

Will the rioxarray.merge module (which implements rasterio.merge) work for your purposes, or are you looking for a way to merge the rasters that does not use rasterio?

2 Likes

@PhenoloBoy - in my experience, this is a common challenge when developing workflows with these data.

My current approach is to leverage gdal vrt to stack (temporally) or stitch(mosaic) these data. The vrt file can also specify warp parameters (gda_vrt_warping) to reproject to a new crs (e.g. epsg 4326).

@TomAugspurger has just started an effort to make this process easier, which leverages STAC catalogs (GitHub - stac-utils/stac-vrt). I think at the moment it focuses on the mosaic workflow (example with NAIP imagery), but I assume a stacking (temporal) functionality can be implemented.

1 Like

Hi @c00kiemon5ter,

Projection issue:
Tiles are projected in a 2d dimension through a Tranverse of Mercator to be stitchable within a zone. Simplifying quite a lot this type of projection, conformal ones, preserves the ratio of two lengths (in this case angle and length ) keeping errors acceptable within a ‘small’ spatial domain. You would not have a problem to create a single huge image if you will stitch within the boundary of 6 longitude degree.
Indeed, once some on will try to deal with two different zones, a completely different approach would be taken but I would not consider it now.

xESMF:
I had a quick look at it and seems really interesting!!! thanks a lot, indeed I’m going to use it. The best way to treat different zones is to convert from planner coordinate to geographic once and the xESMF seems to treat exactly these issues. Unfortunatelly the overlapping is not going to be solved even with this change of reference system.

What you described is correct. The only problem is that once you will start to create, in practical therm, such thinks the first problem that you will figure out is that Xarray can’t deal with overlaps. Consequentially you will have to find a smart way to deal with it keeping in mind that without a lazy approach will end up with a huge amount of data.

Target:
Create a cube of data where I’ve, for each sensed band (let’s say Red/Green/Blu) a mosaic over wich I can apply some algorithms that will involve the band and the time dimension.

1 Like

@rabernat
Indeed .merge is the usual solution for this type of things but I was trying to create a workflow based on a completely lazy approach so the final dataset will manageable.

In any case, I found it interesting that the combine by coords can’t deal with this and I thought to bee a good idea to raise the issue in the community. More easily users can deal with their data more users will find it convenient to use it. Real few people rise the issue as most of the others are leveraging on the work of Synergise and AWS.

Even if we are in a testing phase the entire area covered by the project is bounded within West Africa to India, temporal resolution speaking is 120 days. So I have to create something that will be able to deal with a huge amount of data and I was more prone to create something that not duplicate data on the disk (or at least not from the first step)

@JessicaS11

Good point! I wasn’t aware of rioxarry and seems that most probably will solve the problem! I’ll let you all know if it was fitting the purpose.

@rmg55
Indeed VRT is a good solution to create virtual raster and I’m used too it. Unfortunatelly the wrapping would require a huge amount of effort as you would need for each tile know the neighbour tiles and the exact non-overlapping area. That’s why on a different group the first question has been … ‘Did ESA provide, within the metadata, the non-overlapping area?’ unfortunately I’ll let you imagine the answer.

Vrt has the problem of dealing with multiple crs…so if a large area, you have to handle that first.

The simple merges in rasterio may be insufficient for spectral reasons.

Good point, if someone needs to cover a large area that is span over more than one MRGS zone one of the best options would be to convert everything in lat/lon and then merge.
Indeed would be nice to know how Synergise/AWS treated the projection. Did they move to a Geographic one? From the FAQ seems that for a bug, the L2A lost the metadata that brings the CRS info (no idea if they fix it meanwhile).

I wouldn’t be so strict in the answer. You would need to know what you are doing according to the specific project.
Indeed would be a mistake use an avarage on a L2A scene; at the same time I would not be so restrictive using a First or Last approach in that specific case.
For L1C the situation should be solved using an algorithm that fits best the purpose (min/max/avarage…)

Thanks to the really interesting and well done notebook that you shared few weeks ago my attention has been focus on the metadata from the AWS products. ‘OVR_RESAMPLING_ALG: AVERAGE,’ is reported in a L2A product. I’ve been searching for a little in the documentation but I’ve not been able to fid anything regarding this. Maybe you have more info on this.
For sure, if that would be the algorithm, I would not be too happy as L2A is a BRDF correct and in my application would be better to have original values (First/Last alg.).