THE PROBLEM
Model output storage format for processing, analysis and visualisation.
SCALE
Australia-wide models at 400m.
Call it 10000 x 9500 for simplicity, in pixels.
All models on the same grid and projection.
NUMBER
Over 100, so far, this will probably grow.
FORMAT
Currently geotiff.
DATASETS
Approximately 0.5 TB each.
Stored s3://bucket/folder/time [e.g. when the model was run]
Each model
Each of the 6 below have different data, even if the same variable names.
- 2 44 band outputs that will always be the same - the first set of variables [you could concatenate these, singly or together]
- 1 variable band output from 5 to 30 - a second set of variables - 5 being a subset of the possible 30. [can’t concatenate]
- 2 groups of 44 variable band outputs 3 to 15 - a third set of variables. Each of the 44 containers is the same name as in 1. [can’t concatenate]
- 1 group of variable band outputs 3 to 15 - a third set of variables. [can’t concatenate]
- 16 variable band output from 5 to 30 - same second set of variables, used in arriving at 1. and 2. [can’t concatenate]
- 44 x 4 variable band output from 5 to 30 - same second set of variables, with names from 1, used in arriving at 1. and 2. [can’t concatenate]
DOWNSTREAM PRODUCTS
Distributional type reductions - mean, max, etc. for users. Other more esoteric model analysis for me.
QUESTIONS
- What format to convert to
- How to break down
- How to combine - most efficiently/cost effectively and useably.