Hello all - looking for some advice/references/best practices on generating author lists for software releases.
I just finished getting all of the icepyx releases on Zenodo. I noticed the automatically-generated author lists were highly variable and it got me wondering: who should be on the author list for each release? I couldn’t find any resources that really talk about this, so I wondered if there was a standard best practice I just didn’t know about. As best I can tell, Xarray’s author list is the current maintainers (which makes sense - it would be absurd to list all 300+ contributors as authors). However, in a space (academia) where publication counts are still a critical currency, is that harmful to contributors who then aren’t in the author list? Thoughts?
I’ll muddy the waters further and point out that in terms of packaging metadata, PEP-521 metadata has both “authors” and “maintainers”. On MetPy, I’ve got a PR removing “authors” altogether and just going with maintainers of “MetPy Developers”. I’m not sure what best practice should be.
Thanks for clarifying. I didn’t see any process outlined in the docs for who does get listed in CITATION.cff though, and the file doesn’t currently list all 300+ authors.
It’s absurd to print it out… but not to auto-generate and add an ORCID entry!
Thank you for reframing this - I definitely meant my original statement this way. Each of the 300+ contributors should absolutely be considered an author and get credit accordingly. I still don’t see where/when/how this ORCID entry connection is happening though, if not through Zenodo.
This sounds similar to the approach I am currently pursuing (trepidatiously) for icepyx. CITATION.cff only has a single author entry of “icepyx Developers”, so no individual names would go into each Zenodo release. Then everyone is listed in CONTRIBUTORS.rst and our docs note that all contributors are considered coauthors. My concern is that this may not help people who need to be able to highlight their name on an author list or to have it accounted for in publication metrics that may be being used to evaluate them. And I’m not sure how it could be linked to ORCID ids then either…
I would imagine you personally are seeing an ORCID entry because you are in the CITATION.cff file. For comparison, I would NOT see an ORCID entry for my contributions to Xarray, because I am not in that file (and thus not listed as an author on Zenodo).
What I am trying to figure is out is what the best way to handle this type of scenario is. Does each Zenodo release get a different author list based on who has contributed since the last release (whether that be new contributors, maintainers, whoever)? As you say, this would be a maintenance nightmare because there’s no CITATION.cff generator bot. But, it would avoid the need to try and have 300+ author entries for a given Zenodo release AND give each contributor at least one ORCID entry (and repeated ones if they’re contributing a lot). It also helps with the “this person contributed 10 years ago but is no longer actively contributing” challenge. They have “fixed” credit for their contributions but aren’t getting publication listings now…
The Fatiando model is another good reference, see community/AUTHORSHIP.md at main · fatiando/community · GitHub. They spell it out clearly that to be on the Zenodo release, you need to opt-in by putting your name/affiliation/ORCID in an AUTHORS.md file. For newer software versions, I think the whole author list is just kept, so you do get the case that someone contributing in v0.1.0 gets their name listed until v1.0.0 or later. Obviously a pain for big projects like xarray with 300+ contributors, but the key is to do it early on and put everyone into a CITATION.cff file so things get automated.
For peer-reviewed publications (e.g. JOSS), the list will not just be copied verbatim from the AUTHORS.md file or CITATION.cff file, it will be restricted to people who:
Have made a contribution to the repository or significant non-coding contributions.
Add your full name, affiliation, and (optionally) ORCID to the paper. These can be submitted by pull requests to the corresponding paper repository.
Write and/or read and review the manuscript in a timely manner and provide comments on the paper (even if it’s just an “OK”, but preferably more).
But again, these rules should have been laid out early on, ideally before people have contributed to the repository. It wouldn’t be nice to change or add in new rules after dozens have people have contributed, as it complicates the discussion on who gets to be an author or not.
It also helps with the “this person contributed 10 years ago but is no longer actively contributing” challenge. They have “fixed” credit for their contributions but aren’t getting publication listings now
Honestly, I think these “if cases” just make the problem harder. If any thing OSS seems to be moving towards just acknowledging everyone
Also to be clear, I don’t think the current xarray model is ideal. I much prefer the cf-xarray model, and would like to update that.
Also I think this is a good topic to address with pyOpenSci (cc @lwasser)
Hey everyone! i’ve been thinking about this a bunch but don’t have a perfect answer. i really like that @weiji14 highlighted fatiando’s approach. they do have excellent documentation and i agree that there needs to be some clear pathway to authorship.
On our end of things i always suggest to go out of your way to support contributors with some level of acknowledgement. For instance, for zenodo if you are using that - you can have authors / creators section AND contributors as a separate section. You can see what that looks like here:
the challenge of this approach is it does force you to create a zenodo file. i normally open a pr and ask people to add their info if they have it. but it also alows you to divide creators from contributors (if you wish to).
If you have a very high contribution rate and also want to ensure you get contributors who have NOT committed to your repo, you can use something like the all-contributors bot to add them. then you can parse the .json file that the bot creates to form authorship lists or even a zenodo file if you want to. i’m actually working on that now for pyOpenSci.
If you are publishing with JOSS or another journal you might think a bit more deeply on who deserves “authorship” in the same way you might for a paper and then document the process it as Fatiando does.
in the end I always lean towards providing more credit because it encourages people to contribute more and also places value on their effort small or big. I wouldn’t remove people from a release if they didn’t contribute to that specific release but you can add people in the changelog who directly contributed.
i hope that is a bit helpful. we haven’t developed our sections on authorship yet as packaging has been a focus but it will be coming in the future.
Thanks for this great response, @lwasser! Agree that we always want to recognize and elevate all contributions.
Our original approach was actually based on fatiando’s, but it was woefully outdated and didn’t match our current needs. We ultimately ended up updating our policies while following the ones that were in place for older version releases (which were only recently added to Zenodo). In short, for subsequent Zenodo authorship, we adopted the approach used by The Turing Way (“The icepyx Developers” is the only author, and contributions are recognized via the all-contributors bot), while for peer-reviewed authorship (e.g. JOSS), we allow a curated author list for that submission. This discussion and process has been a great learning opportunity to see where the OS community is r.e. “authorship” and how this interfaces with the academic credit/recognition systems many of us rely on for professional advancement.