Hi everyone,
I am creating a STAC catalog to hold a number of datasets stored in zarr format, and I’ve been following examples that I find in Microsoft Planetary Computer’s data catalog to guide my decisions. I’ve noticed some redundancy in metadata fields (they are listed in two different locations for a dataset/collection), and I was wondering if any of you had any insights into the reasoning behind these choices - or if you have created your own zarr STAC collections and made different choices. I’ve been looking quite closely at this example because it matches the dataset I am working with in many aspects of its structure and metadata:
- Catalog view: Daymet Annual North America | Planetary Computer (microsoft.com)
- Json view: https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-annual-na
Here are some of my specific questions:
- It appears as though the long_name field is stored in both (1) the standard ‘description’ field and (2) a custom field ‘attrs/long_name’ for variables/dimensions in the datacube extension
- Similarly, the units field is stored in (1) the standard ‘unit’ field and (2) a custom field ‘attrs/units’ for variables/dimensions in the datacube extension
- The projection information seems to be present in 3 locations (in sometimes varying forms).
- The first is a variable called ‘Lambert_conformal_conic’
- The second is in ‘attrs/grid_mapping’ as ‘lambert_conformal_conic’ for every datacube variable
- The third is in the ‘reference_system’ field of the datacube dimension (for x and y) as a projjson
For each of these fields, I am curious to learn about the potential benefits of including these redundancies. And if I wanted to minimize duplicated data, is there a preferred location to store it as a default?
Thanks!
Amelia