ODC STAC load Only Returns Last Slice with PlanetScope items (Others are Zeros)

Hi, everyone! I am currently working with PlanetScope PSScene items. The COGs are on an S3 bucket behind a STAC API.

When I fetch STAC items with pystac_client, then odc.stac.load with groupby="solar_day", the dataset has multiple time slices, but only the last slice has non-zero pixels. All earlier slices are zeros.

xx = stac_load(stac_items, bands=("red", "green", "blue", "B6"), groupby="solar_day", chunks={})
xx = xx.compute()

I am expecting a per-day mosaic where overlapping items contribute to a single time slice. In reality, only the last slice has data; all others are zeros (or, with groupby, the mosaic looks identical to just one item).

I have tried the following:

  • custom stac_cfg (regex asset mapping for bands, UDM masks)

  • explicit crs/resolution, union/buffered bbox

  • preserve_original_order=True and sorted items

  • custom fusers (first-nonzero, per-band max) and also loading without groupby then max(dim="time")

I don’t have this issue with AWS’s Sentinel-2 items (https://earth-search.aws.element84.com/v1/). The mosaics run as expected. Also, to note, when I run the STAC items individually, I get valid pixels. I can concatenate individual Xarrays, but that’s not my intention because I lose the groupby feature.

Where am I going wrong? Has anyone worked with PSScene items using the odc.stac library? I will much appreciate any leads on this. I have been trying to debug this for weeks with no tangible progress, and my patience is running thin. Thank you kindly!


Versions I am running:
Python: 3.13.5
odc-stac: 0.3.11
odc-geo: 0.4.10
rasterio: 1.4.3
pystac-client: 0.8.6
pystac: 1.13.0
numpy: 2.2.6
xarray: 2025.4.0
dask: 2024.4.1
shapely: 2.1.1
geopandas: 1.0.1

1 Like

most likely broken/missing nodata int the metadata in the catalog, try adding nodata=0 argument, try with nodata=0xffff also. Please paste sample item json and paste output of rio info <sample file>

1 Like
# minimal planetscope cfg: map analytic SR to rgbnir and include udm2 if present
def make_ps_cfg(collection_id: str) -> dict:
    return {
        collection_id: {
            "assets": {
                r"regex:(?i).*AnalyticMS_SR.*\.tif$": {
                    "bands": {
                        "blue": {"band": 1, "nodata": 0},
                        "green": {"band": 2, "nodata": 0},
                        "red": {"band": 3, "nodata": 0},
                        "nir": {"band": 4, "nodata": 0},
                    }
                },
                # fallback: any GeoTIFF that is not a UDM2, map first 4 bands as RGBN
                r"regex:(?i)^(?!.*udm).*\.tif$": {
                    "bands": {
                        "blue": {"band": 1, "nodata": 0},
                        "green": {"band": 2, "nodata": 0},
                        "red": {"band": 3, "nodata": 0},
                        "nir": {"band": 4, "nodata": 0},
                    }
                },
                r"regex:(?i).*udm2.*\.tif$": {
                    "bands": {
                        "udm2_clear": {"band": 1, "nodata": 0},
                        "udm2_snow": {"band": 2, "nodata": 0},
                        "udm2_shadow": {"band": 3, "nodata": 0},
                        "udm2_cloud": {"band": 6, "nodata": 0},
                    }
                },
            },
            "default_measurements": ["red", "green", "blue", "nir"],
            "default_crs": "EPSG:4326",
            "default_resolution": [3, -3],
        }
    }

def fuse_first_nonzero(dst, src):
    np.copyto(dst, src, where=(dst==0) & (src>0))


col_id = getattr(stac_items[0], "collection_id", None) or getattr(stac_items[0], "collection", None)
stac_cfg = make_ps_cfg(col_id)


xx = load(
    stac_items,
    bands=("red", "green", "blue"),
    bbox=BBOX,
    crs="EPSG:3857",
    resolution=3,
    groupby="solar_day",
    chunks={},  # <-- use Dask
    stac_cfg=stac_cfg,
    fuse_func=fuse_first_nonzero
)

Thank you for your reply! The code above is how I have created a custom stac_cfg and simple fuse function to create the mosaic policy. It did not make a difference, unfortunately. Below are the sample item JSON and the rio info respectively.

{
  "type": "Feature",
  "stac_version": "1.1.0",
  "stac_extensions": [
    "https://stac-extensions.github.io/eo/v1.1.0/schema.json",
    "https://stac-extensions.github.io/view/v1.0.0/schema.json",
    "https://stac-extensions.github.io/raster/v1.1.0/schema.json",
    "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
  ],
  "id": "20240921_074257_53_24a8",
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        ...
      ]
    ]
  },
  "bbox": [...],
  "properties": {
    "updated": "2024-09-25T16:32:13Z",
    "created": "2024-09-21T14:19:09Z",
    "gsd": 3.6,
    "constellation": "planetscope",
    "platform": "24a8",
    "instruments": [
      "PSB.SD"
    ],
    "eo:cloud_cover": 34,
    "view:off_nadir": 3.5,
    "view:azimuth": 103.2,
    "view:sun_azimuth": 147.2,
    "view:sun_elevation": 41,
    "pl:ground_control": true,
    "pl:item_type": "PSScene",
    "pl:pixel_resolution": 3,
    "pl:publishing_stage": "finalized",
    "pl:quality_category": "standard",
    "pl:strip_id": "7591065",
    "published": "2024-09-21T14:19:09Z",
    "datetime": "2024-09-21T07:42:57.536745Z"
  },
  "links": [
    {
      "rel": "root",
      "href": "s3://.../planet_labs/orders/catalog.json",
      "type": "application/json"
    },
    {
      "rel": "parent",
      "href": "s3://.../planet_labs/orders/PSScene/PSScene_collection.json",
      "type": "application/json"
    },
    {
      "rel": "self",
      "href": "s3://.../planet_labs/orders/004db224-bf6d-429c-a5ae-a0900ba0f5e0/PSScene/20240921_074257_53_24a8.json",
      "type": "application/json"
    },
    {
      "rel": "collection",
      "href": "s3://.../planet_labs/orders/PSScene/PSScene_collection.json",
      "type": "application/json"
    }
  ],
  "assets": {
    "20240921_074257_53_24a8_metadata_json": {
      "href": "s3://.../planet_labs/orders/004db224-bf6d-429c-a5ae-a0900ba0f5e0/PSScene/20240921_074257_53_24a8_metadata.json",
      "type": "application/json",
      "roles": [
        "metadata"
      ]
    },
    "20240921_074257_53_24a8_3B_AnalyticMS_metadata_clip_xml": {
      "href": "s3://.../planet_labs/orders/004db224-bf6d-429c-a5ae-a0900ba0f5e0/PSScene/20240921_074257_53_24a8_3B_AnalyticMS_metadata_clip.xml",
      "type": "text/xml",
      "pl:asset_type": "ortho_analytic_4b_xml",
      "pl:bundle_type": "analytic_sr_udm2",
      "roles": [
        "metadata"
      ]
    },
    "20240921_074257_53_24a8_3B_AnalyticMS_SR_clip_file_format_tif": {
      "href": "s3://.../planet_labs/orders/004db224-bf6d-429c-a5ae-a0900ba0f5e0/PSScene/20240921_074257_53_24a8_3B_AnalyticMS_SR_clip_file_format.tif",
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "pl:asset_type": "ortho_analytic_4b_sr",
      "pl:bundle_type": "analytic_sr_udm2",
      "raster:bands": [
        {
          "nodata": 0.0,
          "data_type": "uint16",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 112.0,
            "maximum": 8559.0
          },
          "scale": 0.0001
        },
        {
          "nodata": 0.0,
          "data_type": "uint16",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 137.0,
            "maximum": 8894.0
          },
          "scale": 0.0001
        },
        {
          "nodata": 0.0,
          "data_type": "uint16",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 63.0,
            "maximum": 9439.0
          },
          "scale": 0.0001
        },
        {
          "nodata": 0.0,
          "data_type": "uint16",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 28.0,
            "maximum": 10115.0
          },
          "scale": 0.0001
        }
      ],
      "eo:bands": [
        {
          "name": "Blue",
          "common_name": "blue",
          "center_wavelength": 0.49,
          "full_width_half_max": 0.05
        },
        {
          "name": "Green",
          "common_name": "green",
          "center_wavelength": 0.565,
          "full_width_half_max": 0.036
        },
        {
          "name": "Red",
          "common_name": "red",
          "center_wavelength": 0.665,
          "full_width_half_max": 0.03
        },
        {
          "name": "Near-Infrared",
          "common_name": "nir",
          "center_wavelength": 0.865,
          "full_width_half_max": 0.04
        }
      ],
      "proj:epsg": ...,
      "proj:bbox": [
...
      ],
      "proj:shape": [
...
      ],
      "proj:transform": [
...
      ],
      "roles": [
        "data",
        "reflectance"
      ]
    },
    "20240921_074257_53_24a8_3B_udm2_clip_file_format_tif": {
      "href": "s3://.../planet_labs/orders/004db224-bf6d-429c-a5ae-a0900ba0f5e0/PSScene/20240921_074257_53_24a8_3B_udm2_clip_file_format.tif",
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "pl:asset_type": "ortho_udm2",
      "pl:bundle_type": "analytic_sr_udm2",
      "raster:bands": [
        {
          "data_type": "uint8",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 0.0,
            "maximum": 1.0
          }
        },
        {
          "data_type": "uint8",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 0.0,
            "maximum": 0.0
          }
        },
        {
          "data_type": "uint8",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 0.0,
            "maximum": 1.0
          }
        },
        {
          "data_type": "uint8",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 0.0,
            "maximum": 1.0
          }
        },
        {
          "data_type": "uint8",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 0.0,
            "maximum": 0.0
          }
        },
        {
          "data_type": "uint8",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 0.0,
            "maximum": 1.0
          }
        },
        {
          "data_type": "uint8",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 0.0,
            "maximum": 99.0
          }
        },
        {
          "data_type": "uint8",
          "spatial_resolution": 3.0,
          "statistics": {
            "minimum": 0.0,
            "maximum": 2.0
          }
        }
      ],
      "eo:bands": [
        {
          "name": "B1",
          "description": "Clear map (0: not clear, 1: clear)"
        },
        {
          "name": "B2",
          "description": "Snow map (0: no snow or ice, 1: snow or ice)"
        },
        {
          "name": "B3",
          "description": "Shadow map (0: no shadow, 1: shadow)"
        },
        {
          "name": "B4",
          "description": "Light haze map (0: no light haze, 1: light haze)"
        },
        {
          "name": "B5",
          "description": "Heavy haze map (0: no heavy haze, 1: heavy haze)"
        },
        {
          "name": "B6",
          "description": "Cloud map (0: no cloud, 1: cloud)"
        },
        {
          "name": "B7",
          "description": "Confidence map (percentage value: per-pixel algorithmic confidence in classification)"
        },
        {
          "name": "B8",
          "description": "Unusable pixels"
        }
      ],
      "proj:epsg": ...,
      "proj:bbox": [
...
      ],
      "proj:shape": [
...
      ],
      "proj:transform": [
...
      ],
      "roles": [
        "data",
        "snow-ice",
        "cloud",
        "cloud-shadow"
      ]
    }
  },
  "collection": "psscene"
}
{
  "driver": "GTiff",
  "dtype": "uint16",
  "nodata": 0.0,
  "width": 2149,
  "height": 1572,
  "count": 4,
  "crs": "EPSG:...",
  "transform": [
...
  ],
  "blockxsize": 512,
  "blockysize": 512,
  "tiled": true,
  "compress": "lzw",
  "interleave": "pixel"
}

@prasun this is BAD STAC design by Planet, it will not work with odc.stac nor will it work with stackstac. It is ridiculous that commercial data providers not only do not support maintenance or development of software tools to access their data, they don’t even test metadata they generate against those tools. They just dump support load on OS developers, bravo :clap: for fiscal efficiency. This model is not working, stackstac is not maintained no more, rasterio is on hold, I’m about to quit this crap too. Industry doesn’t want to pay for maintenance, and I’m sick of living from one random development contract to another and dealing with the tax system that assumes constant predictable income.

Asset names should not contain timestamps in them, cause then you can’t know which assets are part of the same group across multiple stac items. Hence you get only one populated image, all other timestamps are missing, cause no other item has that asset name. Normalize asset names by editing manually stac assets keys to a consistent string across items and you should be able to load your data.

7 Likes

@kirill.kzb I have raised this issue with Planet and requested for a sample dataset to be publicly available, so we can work on a fix. I’d love for odc.stac to work out of the box with Planet as it does with AWS’s STAC. I am a big believer in OS development, so I will be happy to contribute to ODC and submit a PR to make this happen. I am thankful for your work thus far on the odc.stac.

I agree that the asset name should not have timestamps, but I’m surprised to hear that the asset name is the culprit here. It makes sense, though. I see that AWS’s STAC has a consistent asset name across the board (e.g., thumbnail, hh), so I will give it a go, and fingers crossed, I can debug once and for all. Meanwhile, I’m absolutely also raising this with Planet with their nomenclature.

Besides, none of the stac_cfg or fuse_func seems to make a difference in my case. Under what circumstances would they be necessary? Thanks much!

1 Like

Thanks @prasun for the detailed reports, and thanks @kirill.kzb for all your help to figure this out. And I’m sorry you’re so frustrated with companies just dumping a big load of burden on OS developers - I’ve certainly felt that way many times before.

I currently work for Planet, and I do feel I should defend them a bit. They supported my time to start the STAC specification, were a sponsor of most all the early STAC sprints, and funded the creation of the pystac library along with some go stac tooling. And they support GDAL as a gold sponsor, as well as funding Even R directly on many improvements (drivers for WFS3 / OGC API, GeoParquet, PMTiles, etc). But I know it’s ā€˜not enough’ - and I wish Planet would be able to support direct funding for even more of the open source geospatial ecosystem of tools.

I’ll take responsibility for not testing with odc.stac or stackstac - I’ve never used either that extensively and haven’t tried it with Planet data. I was intimately involved in the design of the Planet STAC implementation, and ensured that what we produced passed all validation tests. Unfortunately the assumption that odc and stackstac are making about consistent asset names isn’t actually a STAC requirement, or even a recommend best practice. I’m actually not sure why we added that, but when I reviewed our orders STAC implementation I didn’t catch it since I didn’t test with odc.stac. Thanks @prasun for catching it and raising it.

And to be fair to the Planet team, the official guidance on asset keys in the STAC spec is:

In general, the keys don’t have any meaning and are considered to be non-descriptive unique identifiers. Providers may assign any meaning to the keys for their respective use cases, but must not expect that clients understand them. To communicate the purpose of an asset better use the roles field in the Asset Object.

From stac-spec/commons/assets.md at master Ā· radiantearth/stac-spec Ā· GitHub Planet does use the asset roles, but I guess that likely doesn’t work well enough for ODC STAC purposes.

We’re discussing internally at Planet, and we’d very much like these tools to work with our STAC Catalogs. I’m thinking the easiest path would be to make a flag in our stac ordering to request consistent asset names - but I’ll discuss with the team and hopefully get a good solution that fixes this but also doesn’t break existing workflows. And I do think it’d be good for us to at least add something in STAC best practices to recommend consistent asset names (I can put up a PR, though may not be quick as I’m in the midst of a big move to the Netherlands). And perhaps we should consider changing the actual text? But I think there were likely some good reasons to not try to assign meaning to the keys, so maybe there’s another more robust solution.

We’ll also work to get up some sample output from our orders API to make it easier to test out. Also, just a point of minor clarification, it’s not technically a time stamp that we’re putting on the asset name, it’s the full ID of the Planet scene - but it does contain a timestamp within it.

4 Likes

@cholmes, thank you for your supportive response - much appreciated! I, and I’m sure the broader OS community, will benefit greatly from your continued support. I’m happy to walk you through my use case in detail and explain the specific gap I encountered.

I love that odc.stac works seamlessly out of the box with AWS’s STAC, converting STAC items to Xarrays and eventually images while performing useful operations, such as mosaicking adjacent scenes. I was attempting to achieve the same workflow with STAC items from Planet. The mosaicking functionality is particularly important because, as you well know, a single scene doesn’t necessarily cover the entirety of an AOI, requiring multiple scenes to be stitched together - typically scenes captured minutes apart.

However, I consistently encountered invalid pixels except for the last slice of the datacube, despite debugging from every possible angle (hence this post and my plea for help). In the meantime, I’ll manually edit the asset key names and report back with my findings.

I truly appreciate both of your attention to this issue. A well-functioning ODC library benefits everyone, and I’d like to contribute to make that happen.

Thanks @prasun ! We definitely want it to work with odc.stac too. I actually have been using it the last few months and am a fan.

But yes, definitely report back after manually editing the names. If that works I suspect it could be easy to write (or have AI coding tools write) a little script that anyone can use to just change Planet Orders API output to have consistent asset names. Then there’d be at least ā€˜an answer’, and we can work together to make it easier than running a script.

@cholmes I apologize for speaking rashly and making emotional generalizations that are not true. I have been funded to work on the odc.stac during the development stage, and have had several paid contracts to extend functionality of odc.stac and odc.geo, and I am grateful for having had that opportunity. But software systems like that are never ā€œfinishedā€, they are always a work in progress and require significant amount of tedious upkeep: keeping up with dependencies in code and on CI/CD, keeping up with new data providers, keeping up with new formats, keeping up with new object stores, releasing to pypi and conda-forge, responding to issues, enabling productivity of the users of your software by answering questions in a timely manner. It is this important work of ā€œmaintenanceā€ and ā€œsupportā€ that is almost never gets funded directly, sure some of this gets done when a ā€œnew featureā€ contract comes in, but I can’t justify charging double or triple the hours for ā€œunrelatedā€ work just so that I can make new release happen after a period of stagnation. Probably most common form of funding for this is employment by an organization that uses said software, but this always comes with higher priority internal work, and having too much of a focus on the open source part can be detrimental for your career and even employment itself.

I’m not saying I have a solution. Like you said there are many worthy projects out there, not all of them have Even R, not all of them are impactful enough to setup sponsorship, not all of them have developers business savvy enough to pull in funding on a consistent basis.

regex is not supported inside the config. Fuse doesn’t get called when there is only one item with valid data for a given band (as far as odc.stac is concerned there is only one, as we assume consistent names across assets, odc.stac won’t load ā€œdifferentā€ assets into one image plane)

Thank you for kind words, and sorry for being rather terse in my responses. I understand where you are coming from with the offer of a PR, I do, but the truth is that new feature PRs from brand new contributors are kinda hard on the maintainers, and often require more work than implementing it without a PR. There is just too much of undocumented context to be able to pull this off without significant hand-holding from the original developers, even for a very experienced and capable, but new to the code-base, contributor. You are welcome to contribute documentation PR though, probably this Best Practices — odc-stac 0.4.0 documentation should state explicitly that we rely on asset names to establish correspondence across multiple STAC items, and to figure out a full set of independent bands exposed by supplied metadata. Documentation on how to debug catalogs with odc.stac.parse_items would also be helpful.

For this specific feature, I’m not sure I would want to implement it even as a paid contract to be honest (not a great businessman, hey). See my other comment about the cost of the unseen maintenance load. I can implement a feature and get paid for it, sure, but what about the ongoing cost of increased complexity? Or the increased support load some new feature might generate. Who will be financing that? Tragedy of commons is not a solved problem in our societies.

RE: asset names/roles

Say one used random UUID or a hash for an asset name, no problem allowed by the standard, I get it. You have 100 items with the same timestamp, each has 10 randomly named assets, each one pointing to a COG, how am I to figure out which of the 10 assets in item A correspond to the same sort of data in item B? I need to do this to build 10 mosaics stitched from 100 COGs each. All 10 assets have data role, I can’t rely on ā€œorderā€, I can’t even tell if there are 10 bands, 20 bands or 10000 (10x100) separate bands without out of band information provided by the user.

That’s why we rely on asset names, that and because AWS landsat catalog was used during development.

Pretty sure pgstac uses asset names for ā€œdehydrationā€ so you also get the benefit of reduced database storage when asset names do not change from item to item. I’m sure it helps with compression too when combining multiple items into one archive.

I believe there are significant practical benefits to having a well chosen, human friendly name for assets within the same catalog.

@cholmes I apologize for speaking rashly and making emotional generalizations that are not true.

No problem at all - I totally get it. And I really appreciate all the work you’ve done, and continued to do unpaid to attempt to maintain it. It’s a real problem, and I think it’s worst when a project is popular enough that tons of people use it but no one who relies on it has yet realized that it’s a critical dependency that they can’t afford to have go away. And as you point out it unfortunately requires the best developers to do good business development, which is not what they want to do.

The start of my career was actually doing that biz dev to fund open source developers, with OpenGeo (which became Boundless), and it worked pretty well - not just negotiating high hourly rates for new features but also doing enterprise support packages that also funded core development. And before Boundless we were under a non-profit that reinvested all ā€˜profit’ back into the core software. At some point I hope to get back to working on promoting more models / org structures that can sustainably support open source software. But I’m probably still at least a couple of years away from that.

3 Likes

That’s why we rely on asset names, that and because AWS landsat catalog was used during development.

Pretty sure pgstac uses asset names for ā€œdehydrationā€ so you also get the benefit of reduced database storage when asset names do not change from item to item. I’m sure it helps with compression too when combining multiple items into one archive.

I believe there are significant practical benefits to having a well chosen, human friendly name for assets within the same catalog.

Oh yeah, it all makes good sense - I think it’s a very reasonable assumption to make. I’m pretty sure we can get Planet to switch, and hopefully have that will not require any code changes for ODC STAC.

Just want to be sure to get the core specs right. I can’t think of a good reason we wouldn’t include this in the core eventually, but generally the path is best practice and/or extension and let that gestate a bit, and then if it works for everyone get into the core.

We could consider an extension or ā€˜profile’ (I think that’s been discussed) that has it as a requirement. Like so we could have it as a validation rule, and anyone looking to provide their data to ODC or similar tools could check in stac validator.

And @prasun - if you want to contribute on this then I think the best way is to update the docs like Kirill said, and to just convert your Planet order into an ODC compatible version, by aligning all the asset names. And then some script to convert any Planet order, and then we can work to get it implemented in our Order system. Then we hopefully don’t need to touch any ODC code.

1 Like

Following this discussion with great interest, both on the open-source community issues and on the technical questions. Kudos to everybody for a very frank and open discussion about some critical issues.

Just on the technical side, the conversation about asset names reminds me a lot of a similar issue in NetCDF / CF world. In the CF Data Model the actual variable names within a NetCDF file are not supposed to be important semantically. Instead, the semantically important information is in attributes, such as standard_name. We have tools like CF Xarray which allow interaction based on standard name instead of the variable name.

The analogy to standard_name here might be the pl:asset_type field or role field within the assets. Is it possible that, despite the different asset names across the collection, they have a common standard value for these fields? If so, it should be straightforward to create an option in odc-stac to join based on these attributes rather than the asset names.

Yes that exists already eo:bands.common_name , and it is used by odc.stac , but only as an alias, so you can request red instead of B3 for example. But it’s not guaranteed to be unique, so we can’t merge on that, and instead rely on the asset name. Landsat catalog for example has that issue, where red, green and blue appear as both independent 12bit bands and as 8bit bands within a visual asset. Also it’s and extension, so not guaranteed to be present at all.

Also there are only so many allowed names specific to CF needs.

Thanks for the open discussion here and everyone’s hard work on all the critical metadata and software tooling at play here. Just wanted to add a couple concrete suggestions:

> And perhaps we should consider changing the actual text? But I think there were likely some good reasons to not try to assign meaning to the keys…

I think this is a good idea. Maybe something like the following either in spec or best practices: ā€œAsset keys can be anything, but keys under a given Item must be unique and they should be identical across Items in a Collectionā€?

I can understand not wanting to force people to use ā€œvisualā€ if they prefer ā€œtrue-colorā€ or ā€œfavorite-jargony-nameā€ but at least keeping those keys identical across items is defintiely helpful - this has also come up as an issue going between STAC JSON and STAC-GeoParquet, where assets across items are different (NASA CMR STAC’s results rather than Planet in this case): https://github.com/stac-utils/stac-geoparquet/issues/82


I also think it would be wonderful if data providers (in particular commercial ones) ensured small authoritative public STAC catalogs - and ideally public assets too - that are kept up-to-date with *current authenticated API search results*. I’ve seen a number of inconsistencies between open data catalogs that do not actually match STAC returned from the current API, and open catalogs are naturally the first place software developers and scientists go to kick the tires develop workflows.

That’s a good idea. When I get some time I’ll try to implement something. First pass will likely just be a single hand made one (vs trying to figure out how to automatically have it update to track potential changes in authenticated API results). I made the catalog at Planet Labs - Open Data before Planet started implementing STAC for everything (and it also includes a number of non-standard assets). But I can use those same item id’s and order them again, as they’re cc-by licensed. We actually have at least two different ways to get STAC - there’s the search API at https://api.planet.com/x/data/ that doesn’t have any assets past thumbnail (and wouldn’t useful for odc.stac), and then the orders API that generates the data and gives a local stac catalog with all the assets. And actually we also have ā€œPlanet Insights Platformā€ (the evolution of Sentinel Hub), which also has a STAC API. So may be a lot for me to do all three, but I’ll try to at least start with orders, where this one came up.

1 Like

I see that in STAC v1.1 item_assets is now part of the spec and not an extension ( stac-spec/collection-spec/collection-spec.md at master Ā· radiantearth/stac-spec Ā· GitHub ). It can be defined at the Collection level and describes available assets within items of the collection. It is optional, but a good UX to have, explorer presents that information in a very handy way for example. This can only work for collections with a fixed set of asset names.

I think spec should guide catalog designers in a more explicit way towards a solution with consistent asset names across all items. In fact I would suggest aiming for consistency within properties dictionary as well, same property names pointing to same value types. It’s fine to have missing assets or missing properties for some items, but total set of ā€œJSON pathsā€ across all items of a collection should be O(1) not O(N) where N is number of items (which is the case in the catalog that triggered this discussion).


Positives of having consistent asset names are many, even if we ignore interoperability with tools like odc.stac and stackstac

  1. Improved manual discoverability of the data, see item_assets
  2. Easier ad-hoc queering with tools like jq
  3. Can convert a set of items to a table view for ad-hoc data engineering/management tasks
  4. Smaller tables for stac-geoparquet
  5. Arbitrary human-friendly names for assets that do not fit eo:bands.common_name notation constraints
  6. More friendly for ā€œautomatic types from JSONā€ techniques some languages like C# support

Benefits of the opposite are harder to pin-point. It might be a slight advantage on a creation side of the equation: just chuck filename into the asset name and be done with it. So you can create an item JSON from a directory listing without user input of any sort, I guess, but one won’t make a particularly user friendly catalog that way.

We might not be able to enforce consistent asset names with the current validation approaches, but that’s fine. One can always create something that passes all the specs but is not reasonable at the slightest. For example one can swap keys with values in properties dict for some string parameters, at random, item will still validate as well as the original.

I’ll stop lurking and chime in, briefly. Thanks all for an excellent discussion, I’ve shared this with several folks as it echos/reinforces a lot of stuff we hear from all over the STAC ecosystem.

I wanted to respond to @kirill.kzb’s comment:

I think spec should guide catalog designers in a more explicit way towards a solution with consistent asset names across all items.

I wholeheartedly agree, and I see odc.stac as one of the primary examples ā€œimplementation leading the specā€ in a positive sense. I think the STAC tooling ecosystem has matured to the point where we ought to take a moment, record what we’ve learned, and share that back out.

To that end, @m-mohr is leading (with support) a rework of the best practices into GitHub - radiantearth/stac-best-practices: Best Practices for STAC . I think this will be a great chance to formalize guidance like the comments from this discussion, so please open issues or (eventually) PRs there to help build the best set of guidelines we can.

And thanks to @cholmes and Planet as well for always pushing the envelope :person_bowing: (and, I see now, originally suggesting the best-practices breakout: Move best practices out of stac-spec? Ā· Issue #1032 Ā· radiantearth/stac-spec Ā· GitHub ).

3 Likes

Thanks for the pointers @gadomski !

I opened Recommend consistent asset names across all items Ā· Issue #7 Ā· radiantearth/stac-best-practices Ā· GitHub

1 Like