Software citation

I use OSS libraries and haven’t cited them properly. I’m trying to be better, but it can be confusing:
to cite xarray: https://zenodo.org/record/3940662#.X2D3AXlKiUk
to cite dask: https://docs.dask.org/en/latest/cite.html

How can we as a community make citing software easier?

My guess is that most journals will accept Zenodo DOIs and some will not accept conference papers. Should we encourage projects to get Zenodo DOIs?

Wouldn’t it be nice if there was a website where I can just put in a link to my github code and it returns a list of all the citations for import calls. Maybe this already exists? It seems like a natural resource and wouldn’t it be nice if all journals just pointed to it and asked authors to cite software.

1 Like

This is a great idea, @cgentemann, and raises some interesting questions (as you’ve asked). One piece I’ve always struggled with is if I’m citing software (say xarray), should I also be citing all of the OSS packages it relies on (pandas, numpy, etc.) as well? An academic literature analogue would suggest not, but I think it’s a fuzzy area for OSS. Unless pandas is essentially counting xarray citations as pandas citations, it’s effectively “missing out” on a whole bunch of documented use cases. On the flip side, citing dependencies (e.g. via import statements) would quickly lead to a long list of indirect citations and a bloated citation list. I think that Zenodo DOIs are great, but they don’t really address these questions if the package (e.g. xarray) doesn’t “cite” its dependencies beyond the list of requirements/import statements.

I will say that I greatly appreciate all OSS packages that include a “cite” link - I’ve definitely gone down a few rabbit holes trying to figure out how to best cite some of them - and think that this should be a considered a critical file to have (up there with LICENSE and README). I would also make it a lot easier to implement a website such as the one you suggest, without someone having to create and maintain a master database of citations.

There was some discussion about standardizing this on the Python discourse https://discuss.python.org/t/convention-for-encouraging-citation-of-python-packages/1764, but it seems to have stalled.

https://guides.github.com/activities/citable-code/ was a decent writeup, last time I looked at it. IIRC you get some kind of DOI on Zenodo, and each time a release is made it’s updated automatically.

I support the idea of using DOIs to address this. It could be done by mapping the imports, but explicit is better than implicit. DOI is not a perfect system, but it is a well-established tracking system for scientific papers. The impact metric of data and software can only benefit by using the same path. It should be seamless to track the impact in both directions, thus the overall impact.

Zenodo is a convenient way to mint a DOI for software. What I think is missing is that (we) developers do a better job linking our software to other software (and data, and papers) in a similar fashion that we do with our papers. It’s up to the author(s) to judge, with ethics, what a package should link backward in the dependency tree. Once a package has a DOI, and that DOI record is linked to other DOIs (related identifiers fields), we have all that we need to track a unified impact of each ORCID.

Here is some guidance, with an example, on how to fine-tune a DOI record through Zenodo using a JSON descriptor. Note the relation ‘cites’ in the related_identifiers:
https://github.com/castelao/inception

1 Like