In conversations about how to approach the upcoming NSF GEO OSE solicitation (see this thread), the subject of R and its use in the Pangeo community has come up a few times.
I’m wondering whether there’s value in selling a broadening of training materials like Pythia Foundations to include on-boarding R users. Project Pythia was funded to develop Python-specific training material, but the Pythia Cookbook format is not actually strongly tied to the Python language and could happily support content in any language for which there’s an appropriate Jupyter kernel.
I perused the discourse to find examples of R usage and didn’t find much. So, my questions for the community are:
- who among the Pangeans is using R for geoscientific work?
- how would you like to see that better supported from the perspective of building training pipelines from new user to new science?
- Do you think there’s value is selling (to NSF) a more language-agnostic vision for collaborative open geoscience?
Not looking to start any twitter-style Python vs R rumbles here, but I’d to figure out ASAP whether this is a useful angle to pursue for the proposal I will be writing.
1 Like
Hi @brian-rose,
given my poor R credentials, I won’t be the best apologist for the cause, but it is absolutely the case that “there’s value is selling a more language-agnostic vision for collaborative open geoscience”. In addition to its statistics roots, R is the default language for biologists, ecologists, and paleoecologists. In the field of paleoclimatology I am most familiar with, it is how most age-modeling methods are coded, and is the reason why Nick McKay developed GeoChronR. As part of the OSE call, I’ve also been thinking about language agnostic approaches supported in a Jupyter kernel. The hard part is to round-trip between R and Python within a single notebook. I’ve played with RPy2 in the past, but it is a non-starter for Windows users. However, it could be useful in a JupyterHub environment. I hope this information is helpful to you, and am happy to talk some more if you see potential.
Cheers
Julien
1 Like
Hi folks, a colleague pointed me to this forum. I work & teach mostly in R, but also python.
I would definitely echo the sentiment of a language-agnostic approaches – with shared standards around file formats, metadata, and low-level libraries (e.g. COG, STAC, S3 API, OSGEO / GDAL libs), users can enjoy essentially the same abstractions, high-level interfaces, and performance benefits regardless of their scripting language. I think when these elements are done right, the specifics of it being in R or python are almost as meaningless as the distinction of it being run on mac/windows or linux.
I think both the python and R community could benefit from many of the same things in this regard – for instance, both have communities with historical dependence on POSIX-filesystem based workflows and data structures that were optimized for those conditions (netcdf / hdf5). An approach like planetary computer or element84 has taken of standing up a collection of STAC catalogs along with COG-formats on commercial bucket-based storage is pretty game changing for both R and python, allowing the community to essentially bring their own open source programming tools instead of relying only on what Google Earth Engine API provided. Thus I can teach my students the STAC API and how to create virtual mosaics and virtual cubes in GDAL interfaces (on the R side we have gdalcubes, an abstraction built on top of stac-metadata + stars, essentially equivalent approaches exist in python, e.g. GitHub - gjoseph92/stackstac: Turn a STAC catalog into a dask-based xarray) and they can re-use these same patterns across a large collection of spatial data. Previously we had to rely on each spatial product having some custom REST API wrapper to subset from some server, which was both pedagogically and computationally less efficient.
I think there’s plenty to be done in helping R users migrate to these newer workflows, but Ireally like the idea of something similiar to the Pythia-examples in R. I have only quickly browsed the Pythia Cookbook examples, but they look very nice; though there are bits there that seem needlessly python specific (e.g. haven’t quite wrapped my head around the role of intake / intake-esm, which seems to play largely the same role as STAC (maybe I’m mistaken), but STAC is a language agnostic metadata standard with awesome client libraries in python, R, javascript, etc).
1 Like
I think it can make some sense, but I seem to remember hearing that most R users rely on RStudio rather than Jupyter notebooks. So my question would be: would Jupyter-formatted resources be useful to that community?
2 Likes