Making Cookbooks citable: best practices?

The Pythia group has been discussing the value of making our Cookbooks citable with unique DOIs.

I’m interested to hear people’s thoughts about best practices here, and what would make the Cookbook format the most attractive to potential content authors.

I see that the Environmental Data Science folks have elegantly solved this problem in two different ways, with a Zenodo-based DOI for the source repo, plus individual ROHub-based citations for each notebook. Shamelessly tagging @acocac for comment on this!

A few specific questions for the community:

  • Do we care about the subtle distinction between citing source repos vs. citing the published / rendered content?
    • If so, are there good solutions to solve that distinction problem?
  • Is it important that citations separately attribute primary content authorship versus infrastructure-related contributions to the source repos?
    • Again, if so, are they good solutions that achieve this?
  • Should individual notebooks within a Cookbook be citable (with their own author list) as is done in the EDS Book?
  • Broadly speaking, if we built automation into the Cookbook machinery to make DOI generation easy and hands-off for content authors, would that help incentivize people toward putting content into Cookbooks?
5 Likes

I guess this is tangentially related to Best way to cite Project Pythia? - #2 by clyne

@brian-rose, thanks for tagging me here. Happy to share experiences and brainstorming ideas for making Pythia Cookbooks citable!

I’m interested to hear people’s thoughts about best practices here, and what would make the Cookbook format the most attractive to potential content authors.

I think unique identifiers or persistent identifiers for digital objects are essential in collaborative and live research objects as Pythia Cookbooks. DOIs will make easier for others to cite them, reduce the risk of broken links and facilitates tracking how they have been used and cited.

I see that the Environmental Data Science folks have elegantly solved this problem in two different ways, with a Zenodo-based DOI for the source repo, plus individual ROHub-based citations for each notebook. Shamelessly tagging @acocac for comment on this!

We indeed use a Zenodo-based DOI generated through a third-party integration between GitHub and Zenodo for the EDS book source repo. It allows versioning major changes in the Jupyter book and some miscellaneous scripts. We follow the Turing Way’s workflow for releasing different version (see here). Note TTW folks nicely automatised their release workflow (see here).

We also considered generating DOIs for notebooks through the same integration as EarthCube notebooks do. However, we’ve found Zenodo poorly handles some relevant metadata of our notebooks, such as geographical location, bibliography, input and outputs. Thanks to Anne Fouilloux @annefou, we started exploring RoHub, a Research Object management platform that enables researchers to collaboratively manage, share and preserve their research work (data, software, workflows, models, presentations, videos, articles, etc.).

Let me describe a recent example how RoHub facilitates preserving executable research objects and potentially incentivise people to publish EDS book notebooks.

For the notebook repository, GitHub - eds-book-gallery/b128b282-dee7-44a7-bc21-f1fd21452a83: Exploring Land Cover Data, we used RoHub to register it and add how to cite using a W3ID permanent identifier, https://w3id.org/ro-id/b128b282-dee7-44a7-bc21-f1fd21452a83. The right panel in the figure below shows some stats and numbers of resources, annotations, events, etc. Please note it also indicates snapshots, forks, archives similar to GitHub repos. RoHub stats are very valuable and complementary to others provided by GitHub and Google Analytics associated to the notebook repository and EDS book website, respectively.

We can retrieve Zenodo-based DOIs through a third-party integration between RoHub and Zenodo (see for instance the snapshot, ROHub). Ideally, the rendered version of notebooks should use this Zenodo-based DOI instead of W3ID. While the citation using W3ID includes authors and reviewers, Zenodo-based DOI only mentions the former group.

Regarding the impact of citation, we noted the notebook author added it to his online CV (see Book Chapters in CV | James Millington). Additional to the citation, inspired by OSF Badges to Acknowledge Open Practices, we’re thinking to develop custom ones for existing and future contributions in the EDS book.

It’s worth mentioning we haven’t exploited the full capabilities of the RoHub platform and Research Objects (ROs) as living resources. At the moment, we’ve created RoHub ROs for all EDS book notebooks at the post-print stage (before their publication). ROs encompass research outputs created, revised and shared throughout the research lifecycle. This means we could create a RO for EDS book notebooks since their inception (we capture this through notebook ideas issues, see [NBI] Exploring Land Cover Data · Issue #99 · alan-turing-institute/environmental-ds-book · GitHub).

As part of last year TTW book dashes, Anne and I published a dedicated section to introduce Research Objects, Research Object to capture the Research Life Cycle — The Turing Way. Feel free to navigate it.

Hope the above description of how citation works in EDS book and notebooks are valuable for the Pythia Cookbooks discussion with the community. @annefou can provide further ideas. She recently introduced RoHub in ESIP23 (https://www.youtube.com/watch?v=vFS2oAk4R-I) also registered in ROHub.

Looking forward to hearing others opinions!

1 Like

@acocac thanks so much for this very going through this in so much detail! I’ll admit to being a bit overwhelmed with other business so haven’t dug too deep into ROHub yet but you’ve given us a really rich starting point. Much appreciated!

Any other thoughts out there on citation of notebook-based content?

1 Like

@brian-rose I would be happy to make an example with Pythia and later we could also automate the creation of Research Objects (if you are interested and if it helps).

2 Likes

@annefou it would be fantastic to see an example of some Pythia content as Research Object!

I wonder if we can also tempt you to visit an upcoming Pythia meeting to discuss how we might better align Pythia Cookbooks with the capabilities of ROHub?

(although I realize the meeting time – 3 pm Eastern – is probably not ideal for your location)

Replying to your bullets as a casual contributor:

  1. I don’t think it matters, it’s certainly too pedantic for me. Can we say “we don’t recognize a distinction between a repo and its rendered content” as a sort of CYA clause? In my functional-programming-influenced opinion, the fewer states a cookbook can be in, the better.

  2. Can we offer both options? I imagine for large cookbooks, this could be useful in the same way citing a book chapter would be, but it’s the wrong granularity for a small cookbook.

  3. Yes, it would be a big incentive for me to continue developing these cookbooks.

1 Like

Sorry I was very busy and did not manage to follow up yet. I see you have your next meeting today (March 6th). I’ll try to join (I am traveling and if my flight is on time I should be able to attend it).

@annefou that would be great if it works out, but if not, we meet every Monday.

I will be absent today but @ktyle will be chairing the infrastructure meeting.

I tried to join but did not manage to enter the zoom room. Last time I thought this was related to my poor internet connect but today I got the same error (asked me to login).

Yes, the UAlbany Zoom protocols require all attendees to be using a registered Zoom account. I realize now that the Pythia meeting calendar does not mention anything about this requirement. I will open an issue to address this on our webpage. So sorry about the confusion!