How best to use Agents to help build Earth Data workflows?
I’ve been just feeding the agents a bunch of existing workflows (e.g. Notebooks) and letting the agents use them as source material to construct new workflows.
But Gemini responds to the question with this advice:
Monolithic agents fail when attempting to process complex geospatial queries and execute large-scale computations simultaneously. The most reliable approach is an orchestrator-worker architecture where each agent is narrowly scoped:
Catalog Agent: Specialized exclusively in querying STAC endpoints, handling pagination, parsing metadata, and managing spatiotemporal bounding boxes.
Data Representation Agent: Focused on the construction of virtual data cubes. It utilizes tools like Virtualizarr to align datasets and build reference files without moving the underlying bytes.
Compute & Storage Agent: Manages the configuration of the Dask task graph and handles the logic for writing out to distributed, transactional stores like Icechunk or standard Zarr formats.
I’m curious how others are approaching this topic!
I’m only scratching the surface with agents myself, but roborev might be of interest: https://www.roborev.io/ (it’s not EO specific though, this podcast https://posit.co/thetestset/episodes is where I heard Wes McKinney talking about it sorry don’t know exactly which eps but last: 21, and recent ones I think)
I asked Gemini: TL;DR: The discussion regarding RoboRev occurs in Episode 10: James Blair: Part 2 Solutions engineering, critical thinking, and staying human.
In this episode, James Blair discusses the AI-powered tooling he is developing for the Posit ecosystem. He details his use of RoboRev (roborev.io) alongside other tools like Claude Code and Positron Assistant to generate synthetic, industry-specific demos and manage codebases.
cool, it’s definitely in ep 21 and 20 I think also. Claude took my data and suggests 16, 17, 18 as well but can’t confirm (search the youtube transcripts possibly). I had listened to 10 but will revisit as it only piqued for me in recent days, what Wes is saying has grown on me through the podcast. (and OT any podcast reqs definitely welcome on this space!)
The other common use case is gluing already existing workflows together; this kindoff works when you either use something like openeo where you can easily chain “functions” or in your case notebooks that “do” something. The problem with the latter is that they need to be well described and FAIR (inputs, outputs, config, descriptions, etc..). We’ve explored breaking down notebooks into CWL/OGC Application Packages for this and it works OK-ish with agents for small workflows, if you have a large enough code repos/notebooks that require something like this in the first place.
Although my impression is that everyone is really impressed but is pretty much struggling to find & prove real use cases other than search/discovery, sifting through papers/code, code gen or creating metadata/descriptions.
For what it’s worth, I developed a few agent skills that I’ve found useful in my own work. These are AI-developed from a combination of documentation and source code for the relevant python packages.
Not heavily tested, but they seem to work OK for some basic stuff. Use at your own risk! The virtual zarr skill should be up to date with the recent Icechunk 2.0 release and roughly coincident updates.