Building Earth Data Workflows with Agents

rsignell · May 8, 2026, 1:11pm

How best to use Agents to help build Earth Data workflows?

I’ve been just feeding the agents a bunch of existing workflows (e.g. Notebooks) and letting the agents use them as source material to construct new workflows.

But Gemini responds to the question with this advice:

Monolithic agents fail when attempting to process complex geospatial queries and execute large-scale computations simultaneously. The most reliable approach is an orchestrator-worker architecture where each agent is narrowly scoped:

Catalog Agent: Specialized exclusively in querying STAC endpoints, handling pagination, parsing metadata, and managing spatiotemporal bounding boxes.
Data Representation Agent: Focused on the construction of virtual data cubes. It utilizes tools like Virtualizarr to align datasets and build reference files without moving the underlying bytes.
Compute & Storage Agent: Manages the configuration of the Dask task graph and handles the logic for writing out to distributed, transactional stores like Icechunk or standard Zarr formats.

I’m curious how others are approaching this topic!

Michael_Sumner · May 9, 2026, 12:32pm

I’m only scratching the surface with agents myself, but roborev might be of interest: https://www.roborev.io/ (it’s not EO specific though, this podcast https://posit.co/thetestset/episodes is where I heard Wes McKinney talking about it sorry don’t know exactly which eps but last: 21, and recent ones I think)

rsignell · May 9, 2026, 1:07pm

I asked Gemini: TL;DR: The discussion regarding RoboRev occurs in Episode 10: James Blair: Part 2 Solutions engineering, critical thinking, and staying human.

In this episode, James Blair discusses the AI-powered tooling he is developing for the Posit ecosystem. He details his use of RoboRev (roborev.io) alongside other tools like Claude Code and Positron Assistant to generate synthetic, industry-specific demos and manage codebases.

The episode was released on 9 March 2026.

I was chatting with @sharkinsspatial the other day and he told me some Dev Seed folks like obra/superpowers: An agentic skills framework & software development methodology that works. which may in the same genre?

Michael_Sumner · May 9, 2026, 1:34pm

cool, it’s definitely in ep 21 and 20 I think also. Claude took my data and suggests 16, 17, 18 as well but can’t confirm (search the youtube transcripts possibly). I had listened to 10 but will revisit as it only piqued for me in recent days, what Wes is saying has grown on me through the podcast. (and OT any podcast reqs definitely welcome on this space!)

sunnydean · May 10, 2026, 6:37pm

Hi @rsignell, a lot of people are currently exploring using agents mainly as some type of orchestrator (for data mainly atm). Earth co-pilot is a good example of what this typically looks like: GitHub - microsoft/Earth-Copilot: An AI powered geospatial application that allows you to explore and visualize Earth science data using natural language. · GitHub

The other common use case is gluing already existing workflows together; this kindoff works when you either use something like openeo where you can easily chain “functions” or in your case notebooks that “do” something. The problem with the latter is that they need to be well described and FAIR (inputs, outputs, config, descriptions, etc..). We’ve explored breaking down notebooks into CWL/OGC Application Packages for this and it works OK-ish with agents for small workflows, if you have a large enough code repos/notebooks that require something like this in the first place.

Although my impression is that everyone is really impressed but is pretty much struggling to find & prove real use cases other than search/discovery, sifting through papers/code, code gen or creating metadata/descriptions.

shiklomanov-an · May 12, 2026, 2:53pm

For what it’s worth, I developed a few agent skills that I’ve found useful in my own work. These are AI-developed from a combination of documentation and source code for the relevant python packages.

Not heavily tested, but they seem to work OK for some basic stuff. Use at your own risk! The virtual zarr skill should be up to date with the recent Icechunk 2.0 release and roughly coincident updates.

Topic		Replies	Views
Webinar - Building a Planetary-Scale Earth-Observation Data Cube in Zarr News & Announcements	12	1145	June 6, 2024
Monday November 07, 2022 (Machine learning working group presentation) zen3geo: Guiding Earth Observation data on its path to enlightenment by Wei Ji Leong News & Announcements machine-learning	7	897	November 8, 2022
Earth Observation Data Engineer at Geoscience Australia x 3 Jobs	1	682	February 13, 2021
Best Practices for automating large scale Sentinel dataset building and Machine Learning? Science	53	4571	June 7, 2021
Open Planetary Engine :globe_showing_europe_africa: Open Science	7	457	May 26, 2025

Building Earth Data Workflows with Agents

Related topics