I’m trying to find sensible approaches to doing that task that seems common in the pangeo world, that is taking a number of raster images, processing them in some way, and producing a mosaic.
My experiments when the spatial extent is small work well, but I run into trouble when the bounds of the input image increase.
My current task, for example, looks at taking around 200 Sentinel 2 tiles and creating a cog. I’m using odc-stac for this, and my load and process would look something like:
ds_odc = stac.load(
items,
crs=crs,
nodata=0,
bands=('height',),
chunks={'time': 1, 'x': 2048, 'y': 2048}
)
mos = ds_odc['height'].max("time").astype('uint8')
fut = odc.geo.cog.save_cog_with_dask(mos,
out_cogname,
compression="LZW")
fut.compute()
I run this on a PBS cluster via from dask_jobqueue import PBSCluster
. I would typically have access to say 20 workers, and could request around 16gb of memory for each one.
when the number of items is small, then this whips through nicely. When the number of items increases, it takes a long time before processing starts, and I get that common message about the size of the graph.
One massive cog at the end is probably not actually such a good idea, so I have experimented a little bit by creating smaller, non-overlapping geoboxes, and the looping through each geobox to create a number of individual cogs.
This works ok, but seems suboptimal to loop serially through a list of geoboxes. There is significant wasted time between the creation of one cog and the start of computation for the next one.
I was wondering if anyone had any tips on how best to manage large workflows where the result is either a number of cogs, or even a single large cog, without running into trouble with large graphs.
Would love to hear some tips. Perhaps the results could be collated as well, since I’m sure that this is a pretty common situation (hence why we use dask). Also pointers to example notebooks would be helpful; I know people create large cloud free mosaics for example, but I struggled to find example code for this.