Wednesday October 19th 2022: openEO: What it is and how it relates to Pangeo

Announcement
This week on Pangeo showcase we are pleased to welcome Matthias Mohr and Alexander Jacob to talk about openEO, what it is and how our communities can collaborate!

Meeting Logistics
Title: openEO: What it is and how it relates to Pangeo

Invited Speakers:
Matthias Mohr (Github:m-mohr | linkedin | twitter:@matthmohr | ORCHID ID)
Alexander Jacob (Github: aljacob | linkedin | twitter:@_stille83_ | ORCID ID)

When: Wednesday October 19th 12PM EDT

Where: Launch Meeting - Zoom

Abstract:
openEO develops an open API to connect R, Python, JavaScript and other clients to big Earth observation cloud back-ends in a simple and unified way. This talk introduces openEO, tries to compare the two projects and would like to conclude how both projects can benefit from each other.

Relevant material:

Agenda:

  • 5-15 minutes - Community showcase
  • 5-15 minutes - Q&A / Community check-in
  • 20-35 minutes - Agenda and Open discussion
2 Likes

Hey @JimColl,

I think there’s a slight mistake in the forum post. All communication so far said the meeting will be on 12pm EDT, but the forum post says 4pm EDT. We assume that we’ll present on 12pm EDT, is this correct?

Best,
Matthias

1 Like

You’re right Matthias. I have corrected the typo. Sorry for the confusion.

1 Like

I look forward to seeing the recording of this session (I’m sorry I missed it!). I’ve worked with VITO on platform development and know their side of OpenEO. I also work on a Canadian system (GEOAnalytics Canada) that integrates Pangeo-esque systems, so am quite interested in how to link OpenEO with Pangeo-style systems.

An open question I have is about OpenEO versus the OGC’s “Earth Observation Application Package” system (which has a baseline implementation in the EO Exploitation Platform Common Architecture. It seems like these two systems are trying to do the exact same thing: to federate disparate platforms into a single API. From my perspective the EO Application Package system is more compatible with Pangeo-style systems as execution-on-Kubernetes is already assumed, which does not seem to be the case for OpenEO (which appears to be more compatible with Spark-based systems). From my side, we’re currently looking to do some proofs of concept that take an EO Application Package and run them on an Argo Workflow back-end on Kubernetes. Any comments on OpenEO versus the EO Application Package would be of much interest!

Thanks,
-Jason

So sorry about my typo and the confusion, hopefully I didn’t throw you off too much. That recording can be found in the newly updated showcase playlist here: - YouTube
And for the records, you can find Matthias and Alexander’s longer, ESIP IT&I presentation here: IT&I: openEO - API for Earth Observation Workflows - YouTube

@jsuwala This is a good question.

openEO has a focus on defining a “language” with domain-specific and partially relatively fine-grained processes (e.g. we have a “multiply” or “absolute” process) so that users can easily chain them into workflows. The data cube related processes such as reduce_dimension or apply help back-ends to implement processing efficiently. In openEO users usually don’t need to care about hardware, performance and most other technical details. They also don’t need to take care about files/loading the data as they just get a ready-to-use datacube. The back-end implementation caters for it. So this is really more for people that really want to focus on their workflow/alogrithm. This is more the GEE approach where you pre-define (each implementation, not the openEO specification itself) what your users can do and make this very efficient.

EOEPCA on the other hand (as far as I know) runs your Python or R or … code directly. You need to take care about more technical details and you need to make sure your code is performant. You also need to prepare the data yourself, load files, etc. Also, EOEPCA doesn’t describe a language for processing, you just run code on an infrastructure. This is more the “put code in Docker container and run it” approach.

So EOEPCA and openEO target mostly different user groups and can exists side by side as they cater for different needs.

Please let me know if you have further questions (although I don’t have many more insights into EOEPCA).

Hi @m-mohr , thanks for your response.

With regards to the approach of “put code in Docker container and run it”, I would consider OpenEO’s need to support UDFs to be rather similar. I would also suggest that Dask (Dask Distributed specifically) takes a similar approach which ends up being the same “put code in a container and run it” way. My company’s chosen approach is to support both Dask and ArgoWorkflows. Our chosen interface to ArgoWorkflows is the very nice Couler library which aims to provide a unified interface for constructing and managing workflows on different workflow engines. One thought is that a Couler-to-OpenEO submitter would be very nice to bring the PyData ecosystem together with OpenEO.

I see some good discussion going on in GitHub (An Pangeo/ODC-based backend that runs "out of the box" · Issue #16 · Open-EO/PSC · GitHub) to bring Pangeo and OpenEO together, and hope that these discussions can continue to move forward!

  • Jason