Problems with COG images downloaded from the Planetary Computer API

Hi all,

I have downloaded several Sentinel 2 images using the Microsoft Planetary Computer Python API and have come across many transparent/corrupted pixels in these images. These pixels appear mainly when downloading the TCI band (“visual” band) of Sentinel 2, other bands (green, blue,…) don’t seem to have such a problem. I don’t know if the problem is how I load the data and transform the file, if it is due to cloud-optimized GeoTIFF (COG) data or problems in Microsoft image generation.

import pystac
import planetary_computer
import rioxarray

item_url = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2B_MSIL2A_20230111T121649_R066_T27PTS_20230112T034530"
item = pystac.Item.from_file(item_url)
signed_item = planetary_computer.sign(item)

ds = rioxarray.open_rasterio(
    signed_item.assets["visual"].href).squeeze()

ds.rio.to_raster("exampleVisual.tif")

If this file is opened with QGIS or the SNAP program of that one, the following is observed:

The original TCI image is obtained from Copernicus (S2B_MSIL2A_20230111T121649_N0509_R066_T27PTS_20230111T141317)

In the code, the normal geoTIFF is generated from the COG downloaded from Planetary, if you directly download the .tiff file (in COG format) from the link indicated in signed_item.assets["visual"].href, you still see those errors with slight modifications.

As you can see in the images, there are pixels with very low luminosity that become transparent. I would like to know if someone has an explanation or solution to this phenomenon.

Thanks for posting this. I’ll take a closer look today, but just wanted to acknowledge and confirm that I’m seeing it to.

The only thing I’ve found so far is that for these white pixels, one or more of the RGB bands have a nodata values. With rasterio:

In [1]: import rasterio

In [2]: ds = rasterio.open("exampleVisual.tif")

In [3]: masks = ds.read_masks()

In [4]: masks[:, :5, :5]
Out[4]:
array([[[255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [  0,   0, 255, 255, 255]],

       [[255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [255,   0,   0, 255, 255]],

       [[255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [255, 255, 255, 255, 255],
        [255, 255,   0, 255, 255]]], dtype=uint8)

The 0s there means there is a nodata pixel.

We run sen2cor ourselves on the L1C data provided by ESA. I’ll see if I can figure out where in the processing chain those nodata values are introduced, and whether or not they’re expected.

1 Like

Quick update here. I’m still investigating, but right now it’s looking like an issue with the COG conversion (sen2cor outputs jp2000 files, we host COGs on the Planetary Computer). Specifically, the issue is around the nodata value we set.

In our gdal_translate call we specify -a_nodata 0. According to User Guides - Sentinel-2 MSI - Definitions - Sentinel Online - Sentinel Online, the (valid) pixels in the TCI band are supposed to be 1-255, and 0 is used for nodata. But sen2cor is apparently producing values in 0-255 (e.g. the pixel at (0, 4, 4) in the TCI asset). Checking the SCL (scene classification map) asset shows that everything in the top-left corner is valid (non-zero), and so shouldn’t be treated as nodata.

The actual values in the raster should be just fine to use. But some software (like QGIS) will interpret the nodata value we set in the COGs and use that to inform how to display the raster, calculate statistics, etc. And this actually highlights an issue with the TCI image from Copernicus: the entire bottom-right side of that image is (0, 0, 0). But since there’s no nodata value set it’s interpreting those as a valid RGB value.

I’ll follow up later once I’ve confirmed what the right thing to do here is.

2 Likes

Thank you very much Tom for the answers,

Yes, the problem with the range of 0-255 and 1-255 makes sense, since the pixels that do not appear in QGIS or GeoServer are those that had very low luminosity or very black blacks (I imagine that with one of the RGB components be 0, it would become NonData).

One more quick note: you can use gdal_edit.py to remove the nodata flag from a TIF:

gdal_edit.py -unsetnodata T27PTS_20230111T121649_TCI_10m_unsetnodata.tif
Warning 1: T27PTS_20230111T121649_TCI_10m_unsetnodata.tif: The IFD has been rewritten at the end of the file, which breaks COG layout.

I’m not sure whether that warning is relevant for your use-case.

Because the actual raster values are fine in the COG, this should recover the same values as the L2A scene from Copernicus (with the caveat that we use a different DEM than ESA: see Differences on Sentinel-2 Level2A products on PC and the ESA product. Where do they come from? · Discussion #149 · microsoft/PlanetaryComputer · GitHub for more).

I opened What is the valid range for the TCI product? - sen2cor - STEP Forum on the Sentinel forum but will post here when I get an answer on whether values of 0 in the TCI image should be considered valid or not.

1 Like