Model [geotiff] postprocessing at scale - what would you do?

RichardScottOZ · May 12, 2022, 10:24am

THE PROBLEM

Model output storage format for processing, analysis and visualisation.

SCALE

Australia-wide models at 400m.
Call it 10000 x 9500 for simplicity, in pixels.
All models on the same grid and projection.

NUMBER

Over 100, so far, this will probably grow.

FORMAT

Currently geotiff.

DATASETS

Approximately 0.5 TB each.

Stored s3://bucket/folder/time [e.g. when the model was run]

Each model

Each of the 6 below have different data, even if the same variable names.

2 44 band outputs that will always be the same - the first set of variables [you could concatenate these, singly or together]
1 variable band output from 5 to 30 - a second set of variables - 5 being a subset of the possible 30. [can’t concatenate]
2 groups of 44 variable band outputs 3 to 15 - a third set of variables. Each of the 44 containers is the same name as in 1. [can’t concatenate]
1 group of variable band outputs 3 to 15 - a third set of variables. [can’t concatenate]
16 variable band output from 5 to 30 - same second set of variables, used in arriving at 1. and 2. [can’t concatenate]
44 x 4 variable band output from 5 to 30 - same second set of variables, with names from 1, used in arriving at 1. and 2. [can’t concatenate]

DOWNSTREAM PRODUCTS

Distributional type reductions - mean, max, etc. for users. Other more esoteric model analysis for me.

QUESTIONS

What format to convert to
How to break down
How to combine - most efficiently/cost effectively and useably.

martindurant · May 12, 2022, 5:45pm

I’m sure you know, but kerchunking geotiff would be something I’d like to see here, perhaps to compare against whatever conversion/rechunking you choose to try.

RichardScottOZ · May 12, 2022, 8:21pm

Thanks Martin… worth a look for sure. If I follow…I had not read all of that discussion until now, this would be a convert to cog case.

RichardScottOZ · May 14, 2022, 5:41am

Creating a virtual zarr datacube from COG s3 objects · Issue #125 · cgohlke/tifffile · GitHub … trialling along the lines of Martin’s suggestion

RichardScottOZ · May 26, 2022, 7:02am

Or, how would you chunk this?

Dummy placeholder :

RichardScottOZ · May 26, 2022, 7:03am

some subset of ‘grapes’ and ‘model’ will have data in a given time, sometimes all of them.

RichardScottOZ · May 27, 2022, 6:02am

Or the other way, which may be logistically ad-hoc more useable given the region,append functions of to_zarr is to have a ‘long’ dataset

With 1300 odd variables that are basically just time, y, x - so all the same.

RichardScottOZ · June 1, 2022, 10:34am

Any, the 1300 odd variable version is now a thing!

Topic		Replies	Views
Am I thinking about this data processing/chunking workflow correctly? Data	8	1024	June 9, 2023
Data catalogs (and a bit of data engineering such as datacube, STACs) and Google earth engine Data	7	967	May 8, 2023
What's the best file format to chose for raster imagery and masks products Data	18	1161	March 14, 2025
Cloud Optimized Geotiffs + Pangeo best practices Data	4	2060	January 21, 2021
HPC Time series processes Science	5	1018	February 10, 2020

Model [geotiff] postprocessing at scale - what would you do?

Related topics