Status of xGCM?

Does anyone know what the current status of xGCM is? The last “official” release is v0.8.1 from 2022, which is old enough to be swimming in depreciation warnings. There doesn’t seem to be much activity on the GitHub repo either.

Has xGCM been superseded by some other tool that I should be using? Or is it worth going through and fixing up the depreciated code?

Thanks!

I’d say it has reached a point of stability / maturity and the maintainers are focused on other things right now! :wink:

We would welcome involvement of anyone who wants to help maintain and update xgcm.

Unfortunately xGCM doesn’t really have any active maintainers right now. We had been hoping that somebody who uses it regularly would be interested in stepping up to help maintain it, but that hasn’t happened. @christopher.wolfe it would be great if you (or anyone else) were interested in that.

There doesn’t seem to be much activity on the GitHub repo

To give more detailed context: xGCM (like xhistogram and xrft) was originally created by @rabernat , then developed primarily by @jbusecke and myself whilst we were part of Ryan’s oceanography research group at Columbia. There were a number of other contributors, but the only other person with write access is apparently @dcherian. At the time all of us worked in oceanography research, but now none of us do, and none of us have extra time to help maintain xGCM anymore (because we’re too busy either running companies or maintaining several other open source packages).

Does anyone know what the current status of xGCM is?

It is stable in the sense that it is usable, and for handling staggered grids it has a fairly solid and complete feature set. However, the scope of the package is potentially much larger, and there is unbounded amount of extra features that could be added, for example to deal with different grid topologies and numerical methods, some of which we started work on.

Has xGCM been superseded by some other tool that I should be using?

I’m not aware of anything in python, but the Julia community might have some equivalent by now.

swimming in depreciation warnings

My main contribution to xGCM was to quietly refactor almost the entire internals in the name of performance at scale with dask (see this SciPy talk). I consider that project a success in that it helped push the boundaries of geospatial workloads at scale with dask ((1), (2)), but the refactor also allowed us to start a big deprecation cycle that we never got around to finishing. This is likely the source of many of the deprecation warnings you are seeing. (Others are probably due to more minor changes to xarray’s API that happened in the last 3 years.)

is it worth going through and fixing up the depreciated code?

xGCM did have a significant number of users, and should still be fairly stable, as the main dependency is xarray, which is very stable. But there is technical debt to pay down by finishing this deprecation cycle. If the package is useful to you then this shouldn’t be too difficult to do, but it would require taking ownership. I would love to see that happen, and I am more than happy to spend a few hours talking you through the current code and what would need to be done, but I’m not going to submit any more pull requests myself to the package (and I imagine @rabernat @jbusecke and @dcherian feel similarly).

EDIT: If no-one steps up we could take the step of explicitly archiving the package, to avoid any confusion or false promises of maintenance. I would really rather not have to do that though.

Thanks for the explainer @TomNicholas.

I might have a slightly different perspective on some things here so I just wanted to clarify that.

I am not (yet) dismissing the possibility of returning in some function to xGCM, but this depends heavily on where I will be employed next.

I do think it would be fantastic to find new maintainers but this project is especially in need of a more long time commitment.

The code base is stable but also very heterogeneous, representing changes in maintainers and in my case changes in my coding abilities over time (xGCM was the first proper OSS project I ever contributed to, and it shows in parts :upside_down_face:). Just figured that we should be upfront about that.

Overall I am happy to chat with folks about next steps and provide assistance (time permitting) if we find somebody - or even better a group of folks - who are able and willing to commit some time to the project.

3 Likes

I have for a while been hoping that my group will take on some of the responsibilities of maintaining xGCM for the long-term, but over the last 2 years had been focusing on growing and fundraising for my new group at UC Irvine. Now that we’re fairly spun up, it would be a good time for me to entrain some of my PhD students and postdocs.

Our group is focused entirely on physical oceanography applications of xGCM but this aligns pretty naturally with the package’s history, so I don’t see that as a problem.

1 Like

Thanks for the clarification.

I’m probably too little of a software engineer to be that useful maintaining an OSS project—I’m happy to write my own code or tweak things here and there, but I don’t usually have the patience to make sure something works for everyone else’s edge cases.

@hdrake It’d be great to have you work on xGCM. As @rabernat says, it’s reasonably mature but does need someone to keep an eye on bugs and keep it synced with changes in Xarray and dask.

Thanks both - there’s some additional discussion about this going on in this issue on the xGCM repo.

@hdrake amazing - I will add you as an admin of the repo now so that you can give permissions to anyone from your group.

1 Like

@TomNicholas, thank you for adding me as an admin. I’ve now familiarized my the open issues and PRs and have a good sense of how to move forward.

However, I realized that an important component of maintaining xGCM and its uptake by the community is its integration with the Pangeo example gallery. It seems like this is accomplished via the submodule xgcm-examples, which resides in a different repository (GitHub - xgcm/xgcm-examples: Examples and tutorials for xgcm).

Two questions:

  1. For @jbusecke: Is it possible for me to also be added as an admin to xgcm-examples?
  2. For @rabernat: I have only ever accessed Google Cloud data via the public Pangeo hub, which is no longer operational, which means I cannot run the xgcm-examples notebooks myself without paying egress fees or setting up my own cloud-native Jupyter hub server. Is there another community hub I can use to contribute to xgcm development?
1 Like

Amazing! So pleased to see this happening.

@rabernat and @jbusecke can speak to this component better than I (I was never involved in this), but I don’t really think that xGCM requires this gallery. It’s cool, but it’s also pretty old now and relies on unmaintained infrastructure, so it might be easier at this point to just cleave it off from the main xGCM documentation.

1 Like

Heya folks, I am slowly getting back into some of this community work here. I had a great chat with @hdrake today and I think we were able to unblock some of the requests above and define a rough outline of work to keep xGCM running. Let me outline this here just for it to be publicly available (and to give some estimates of my personal commitment to them):

  • We need regular releases for downstream packages. We have now increased the bus factor on both the main xgcm and the xgcm-examples repos and simple version updates/releases should not be too much work (I am happy to help with this, particularly if my envolvment can empower new maintainers to feel confident in making independent decisions going forward).
  • For now we (the remaining and new maintainers) will not work on new features, but aim to address bugs if possible (I will try my best to review issues/PRs but it will be tough to maintain steady work on this project without funding)

We also discussed what we would consider a ‘feature-complete’ version (v1.0) of xGCM with the understanding that none of the following can be expected from maintainers without funding*.

  • Complete padding for face connected grids This might reflect to a degree my personal opinion but at least @hdrake and I agreed that additional functionality for operations (e.g. diff, int, interp) can (and should) be developed as grid ufuncs and should not be managed as part of the core xgcm, but completing the array padding of simple and complex connected grids would unlock a large amount of downstream uses (implementing the tripolar seam and full support of e.g. an LLC/cube-sphere grid would literally cover most global ocean models).
  • We should probably refactor the documentation with best practices how to customize and organize operators for specific models, configuration or numerical methods. To me this crucially involves guidance on how to use all available metrics within the grid ufunc framework. This clearly needs some more detailed discussion, but I think it would be the only way out of a ballooning maintenance burden while accomodating many different requirements from users.

* If anyone knows of funding sources that might support work like this, I am very happy to chat as I am still very keen to help this project forward after my departure from Columbia

1 Like