Tutorial Idea: Writing APIs for your pangeo package

Is there any guidance material anywhere on how to write a good interoperable API for a pangeo package?

It’s important, not least because interoperability is what has allowed pangeo to grow so quickly.

It could be advanced material under Project Pythia, but also crosses over with @TomAugspurger 's efforts to clarify the Pangeo Package guidelines. What I’m suggesting might also function like a style guide.

I wrote some ideas down here, but part of the motivation is that writing a general python package API =/= writing a pangeo package API that interacts nicely with xarray / dask.

Do people think this would be useful? Or is this not actually a problem?

cc @jukent @mgrover1 for the pythia crowd

5 Likes

Great discussion topic @TomNicholas. @anissa111 and I are working on creating a Pythia cookbook with information on packing. This definitely builds on top of that content and could have a place in there or in a separate, more advanced packaging cookbook.

2 Likes

I this is a critical need need for this community!! I like the ideas you’ve included!

2 Likes

I would be happy to see such a tutorial.

2 Likes

I agree this overlaps with Next steps for Pangeo / PyOpenSci Collaboration - #3 by TomAugspurger, and more generally with the goals of PyOpenSci. It’d be good to clarify how this tutorial would differ from the goals of something like PyOpenSci’s package guide. The way I’ve been thinking about this is to defer the majority of things to PyOpenSci, and only focus on the things that are (somewhat) unique to pangeo: geospatial things, working well on the cloud, working well with large datasets, etc. That gets more eyes and more maintainers for the general stuff.

I think many of the items in Pangeo package API design - Google Docs are good, general guidelines, but aren’t necessarily unique to pangeo. Some are (“interfacing with xarray”, for example). And so maybe our page “pangeo best practices” can have an overview like it does today, and then go into detail (tutorial style) on some more specific topics like interfacing with xarray?

It’d be good to clarify how this tutorial would differ from the goals of something like PyOpenSci’s package guide.

Yes, good point. I had not thought about that in detail before posting these ideas. From a brief look it seems PyOpenSci’s guide focuses more on packaging and documentation questions rather than API design, at least what’s written out so far.

I think many of the items in Pangeo package API design - Google Docs are good, general guidelines, but aren’t necessarily unique to pangeo. Some are (“interfacing with xarray”, for example).

I think that to a significant extent these are much the same thing, and as long as useful material exists in an accessible place we shouldn’t worry too much about splitting hairs between “interoperable with the Pangeo ecosystem” and “interfaces with xarray” for example. PyOpenSci does not appear to be specific to xarray or pydata either. I agree that a tutorial that explains how to achieve the recommendations listed in Best Practices for Pangeo Projects would be a good goal. That said xarray is not domain-specific, and it would be in the interests of the xarray project not to put general material in a place labelled as being specifically for geoscientists.

So might it more sense for this material to live in PyOpenSci under “How to make your package interoperable with the xarray pydata ecosystem” and be linked from Pythia?

These might be good question for @lwasser to try to answer. Does something like “Here’s how to design an API” fit within the domain of PyOpenSci?

we shouldn’t worry too much about splitting hairs

Agreed!

hi y’all! A few notes on this (i’m meeting with @TomAugspurger to chat a bit in just a few so i’ll follow up after that conversation as well!)

  1. YES! that kind of tutorial is absolutely in scope. AND it would be awesome to have a tutorial like this that is visible via our site because i know that other communities use Xarray. it would be really great to allow other communities to see the guidelines you’d like to see in terms of interfacing with Xarray and dask. pangeo is the first collab that we are working on but i think we will have others and probably others in the geoscience / spatial space as well. you all could lead the charge here on defining best practices :slight_smile:
  2. related to this i’m actually planning to create a tutorial series on packaging. it will complement our packaging guide. the goal of this is to standardize packaging approaches as there are SO MANY options in the community. I"m working with folks from PYPA and scientific python and would LOVE for pangeo to be a part of the conversation too. let’s work together to build tutorials around best practices for package to avoid the continuin n+1 standard for packaging that keeps happening as others create their own guides. I’d love to see this be more of a community effort.

In terms of how we publish things, i’d be really happy to add pangeo or pythia etc banners on whatever is created as a (broader) scientific community resource to ensure that there is high visibility for it and true acknowledgement that pangeo / pythia, whomever authored the content and should get full credit for it (if that sounds good). happy to chat more!

my big goal here is to reduce the number of best practice tutorials and guides and streamline with the ultimate goal of an accepted approach to packaging that communities embrace! And i’m guessing pangeo will have specific packaging use cases that would be terrific for me to understand better as we work with pypa and others to create some sort of standard approach.

Here is a (messy) start at my playing with various packaging tools and approaches. i’m trying to collect use cases for both pure python packages and non pure python / packages with extensions in other languages. Boy i’d love to add the Pangeo lens to this analysis.

Let’s talk more!! i don’t want to slow momentum down i just want to add a bit more of a broad approach / lense as these discussion happen! i am super excited to work with y’all!

1 Like

Hi all!

I think this discussion is related to the concept of the SPEC that is underway: Scientific Python - SPEC Purpose and Process

The SPEC process is designed to identify areas of shared concern between projects in the scientific Python ecosystem and to produce collaboratively written, community adopted guidelines for addressing these.

Here you are talking about unifying APIs for Pangeo projects, but it would be even better to unify more generally with Scientific Python projects where possible. xarray is considered within the Scientific Python ecosystem.

Just wanted to make sure people are aware of this effort, too. Thanks.

3 Likes

One question, clarifying the scopes of / overlap between Project Pythia and PyOpenSci.

Is it fair to say that Project Pythia is targeted at users of these packages, and PyOpenSci is targeted at developers / maintainers?

Lots of people wear both hats at different times, but that’s at least how I’ve been thinking of the two. Does that align with others’ views?

1 Like

@kthyng hi!! :wave: nice to virtually meet you!

Here you are talking about unifying APIs for Pangeo projects, but it would be even better to unify more generally with Scientific Python projects where possible.

Absolutely. Just so you know I talk with Stefan regularly and we have been talking about packaging and specs. The one thing to consider with formalized specs is they will be slower to evolve and be approved. Whereas a tutorial and our guide are things we are working with that community (and others) and getting open reviews on, BUT we can push that content out a bit more quickly. (and then update as we need to!). I’m happy to work with scientific python on any element of this that makes sense so it’s fully collaborative is the short of it. we all have extremely complimentary goals here.

I also am thinking about pulling together a working group that bridges several of these communities and efforts with the focus on streamlining packaging. so we could totally rope stefan into this conversation whenever we want (and i’ll bring it up to him today as well on discord).

To answer Tom’s question, yes we definitely are targetting maintainers. And more specifically maintainers who are not working on the BIG projects like xarray, numpy, pandas. They already have teams supporting their efforts. However please note that because usability and documentation is important to us, users will benefit too.

We also hope to bridge into the code → reusable code (ie packaging) once we feel confident that the high level packaging topics are a bit more stable. At that point we’d be targetting users as well with the focus on open science (also core to our name). i think our work there would be complementary to both pythia (but don’t let me speak for y’all!!) and also other efforts i’m aware of at places such as NCEAS, etc!!

psyched that this convo is happening.

2 Likes

One other note - hope i’m not writing too much :slight_smile:
We are also focused on (diversifying) contributors to open source and those folks might not be devs or maintainers. :slight_smile:

Ok glad everyone is connected here! Thank you!

1 Like

SPECS…

Thanks for pointing to the SPECs, I think that whole process will be very valuable in general. I kind of think that a SPEC would be overkill for the tutorial I’m proposing here though.

we definitely are targetting maintainers. And more specifically maintainers who are not working on the BIG projects like xarray, numpy, pandas.

This is the audience I imagined might benefit from some API guidelines. And first-time or would-be maintainers who are in the process of developing a small package.

@lwasser for this tutorial to live in PyOpenSci (and be linked from Pythia etc.) does it have to conform to a specific format?

1 Like

hi @TomNicholas Great question!
This will be our first tutorial published so right now there are no requirements! if it’s something that people could run the code for, we could talk about a binder link, CI build etc.

My plan had been to create a python-packaging-tutorials section on the website. then, we can add content there. we could have a clear pangeo banner at the top as well.

i’m very open to what this looks like and how we link back to pythia / pangeo etc from it, etc.

Also if you write the tutorial and the group here / you decide it might be better on Pythia (please know I want whatever we do to also be the best decision for the Pangeo community!!) that is also totally fine.

in that scenario, we could potentially pull out best practice standards from it that would then go into our packaging guidebook which would then link to your tutorial if that makes sense. @TomAugspurger and i loosely discussed this yesterday as an option!

This will be a valuable resource!!

does the above sound good to you / folks here?

1 Like