In our recent brainstorming sessions, one idea that emerged was Pangeo office hours. The idea would be to create a safe, informal space for people, particularly new users, to ask questions about Pangeo tools, contributing to open source, etc.
@dcherian volunteered to help organize to get this started. I’d be eager to participate.
Thanks @rabernat. Here are some thoughts I typed up.
I am explicitly requesting input from absolutely anyone, particularly those from underrepresented groups.
- What do you think these calls should look like?
- How do we encourage a more diverse contributor base?
Increase diversity of contributors to software in the Pangeo stack
Metric: increase in issues filed at all levels of pangeo stack (low level:
xarray / dask; higher level: xgcm et al.)
Metric: increase in number of PRs to all levels of pangeo stack
Metric: increase in diversity of core team members for software at all
levels of the Pangeo stack.
Increase diversity of participants in Pangeo conversations
Metric: Increased participation on pangeo communication channels: Weekly
Zoom call, Discourse, Gitter, Github
Increase sense of community and distribute expertise throughout the community. This will help reduce load on more experienced contributors.
Metric: Increased instances of “community” members solving problems on the
- Host a 1-1.5 hour “office hours” call every two weeks.
10 minute “personal story” section: package creators and maintainers
describe their “open source journey”.
20-30 minutes on “contributor” questions
- Highlight 2-3 “good first issues”
- Livestream fixing a bug in 10 minutes (sounds stressful :P)
- Simplest examples would be improving error messages; or improving
- Listeners ask questions about open PRs or request help starting a Pull
10 minutes on “cool pangeo demo of the day”.
- Bite-sized demo of what’s possible.
- Increase visibility of pangeo and pangeo-adjacent projects
- Explicitly invite contributors from underrepresented groups to present
their packages / solutions to problems.
- Ask participants to recommend packages / people for next call.
30 minutes on “usage” questions.
(inspired by https://github.com/kubernetes/community/blob/master/events/office-hours.md)
- These should be general questions
- Encourage that question be posted on Discourse prior to the call. Asker
is encouraged to post a follow-up answer after the call.
- “Questions that aren’t addressed or need work can be punted to the next
week or we can encourage other people to give them a look, at a bare
minimum we can at least help socialize the difficult questions.”
- Hosts should encourage listeners to participate in answer. This will
help distribute the load of handling these calls.
Last 5 minutes: Anonymous post-call survey of participants
- We need numbers to judge how we are doing at meeting goals.
- What worked?
- What didn’t work?
- What did we miss during this call?
- How can we do better next time?
- all calls should be recorded and posted to youtube.
- IMO it would be good to cut the calls up into sections, especially the
“personal story” and “cool demo” sections, and post them separately on a
Pangeo youtube channel.
Success requires that “senior members” commit to
- Participating in “office hours” calls when possible.
- Participating on discourse.
- Mentoring and guiding contributors and Pull Requests to completion
- Providing a welcoming inclusive environment that encourages participation from individuals of diverse backgrounds and all skill levels.
This above list is a major time commitment, so we need ideas on how to make this sustainable.
- Call hosts should explicitly encourage community input prior to answering questions.
Success also requires significantly increasing participation above current levels. This will require personal invites and broad publicity in many channels
- Regular posts on pangeo channels: twitter, discourse, gitter, github
- ClimateGrad slack channel
- Shall we ask users to email their grad-school/postdoc/research-group email lists
- Entrain professors and their groups at an early stage
- Younger cohorts tend to be more diverse so professors at undergrad institutions would be a good avenue to increase diversity of participants.
- Reach out to “outreach” / “training” groups at institutions to participate
- e.g. NCAR
- Email mentors and past participants at summer schools & hack weeks: e.g. Brian Arbic
- who else?
Thank you for such a thorough plan, @dcherian!
I find GitHub issue triaging to be one of the ways to increase diversity of contributors. I am always delighted when I navigate to a GitHub repository’s issue tracker and find issue labels such as “good first issue”, “good second issue”, “beginner friendly”, “easy bug fix”, and “low-hanging-fruit”. These issue labels send a welcoming message to first time contributors.
Regarding core team members, I really think that for big projects like xarray, it would be useful to have “How to Become a Core Contributor” guide. I find Python’s core dev guide to be such a great example. It tells you what is expected of a core dev, the requirements to become a core dev. So, having a guide like this may be useful for folks interested in becoming part of the core team, but aren’t familiar with the process.
Another point worth considering is pair-programming when possible. This one can be tricky, and time consuming, but I do think that it is unmatched as a method of upskilling other contributors especially after first time contributions.
I’ve heard great things about sprints at SciPy conferences. Dask and Xarray had sprints last year, and I think they went well and people enjoyed them. Would it be useful to alternate between Pangeo office hours and Pangeo targeted sprints once in a while? Open source contribution sprints provide another great way to build better collaborative relationships IMHO.
I am looking forward to this, and I am happy to help out with organizing some of these.
I think these are all wonderful ideas, and I’d be happy to make it a priority to be involved in the office hours. I think this is a great infrastructure to create a safe and friendly environment to encourage and mentor people toward contributing to open source. It’s a very intimidating thing to start out with, so I particularly like the ideas of pair-programming, and attempting to show live how one would fix a simple issue. The sprints would also be great with an active mentor for each sprint team.
I will say that even as a white male with plenty of support and resources, it was still extremely daunting for me to get involved in open source. I felt insecure and nervous about contributing code and it literally took me having the privilege of working at Los Alamos for a summer to start on open source (through guidance of scientists there).
So the question I put forth is: How do we specifically encourage diversity through this system? All the tools seem to be in place here, but we need to find a way to recruit people of color and other underrepresented folks to these calls. What are the avenues we use to get diverse audiences on the office hours? Do we start at the undergraduate level? Perhaps outside of the scope of the office hours discussion, but are there ways to get pangeo-driven funding to recruit underrepresented students as e.g. summer interns for contributing to pangeo packages?
Oops @dcherian I missed your last subsection! That totally gets at answering my question. I think with regard to your comment on training programs, we can recruit from the wonderful diversity internship programs that exist in Boulder and hopefully other cities.
Off the top of my head, we have in Boulder:
Some of these programs have lab-driven science. But SOARS (NCAR) in particular has students working a lot with computational resources. That would be a great place to be involved. Perhaps we can have pangeo folks serve as computational mentors (I have done this in the past) and get the SOARS interns involved in the office hours and on mini hack projects.
Good point. It’s something I’m concerned about here. I think we should go ahead with the office hours regardless, but we need a separate, targeted effort to help bring more diversity to this and other Pangeo interaction points.
At the #ShutDownSTEM brainstorming session, a couple of concrete ideas stood out:
- Direct invitations to individual POC to join (requires having some ideas about who you want to invite)
- Outreach through HBCUs, e.g. via geoscience, geography, or computer science departments.
Both could be effective here.
First, thanks to @dcherian for inviting me into this thread even though I’m more of an interested, observer of Pangeo at this point.
With regard to the encouraging people into open development particularly amongst underpresented groups. As I’ve worked with postdocs, students, and senior scientists who are unused to having development be publicly scrutinized, there can be a large amount of anxiety and potentially feelings of shame if a PR is not ‘perfect’ the first time around. Even in the most supportive of development communities, I do think that there’s some secret aspects of open development culture that we can often take for granted- the most important being that imperfection is to be expected! For instance: most contributions will receive critique and feedback which is both encouraged and also educational for the community at large. Additionally too, reviews are expected to be concise which can often lead to feelings of being overly criticized. Consider a common review comment, “Remove unnecessary line.” It’s incredibly easy for such a terse comment to be taken as a personal judgement as opposed to what those of us with more experience in open development would simply view as a “Whoops, yep I sure did leave that in there.”
What I’ve found is a particularly good way to overcome this high barrier to entry is to have people who approach me work on a ‘pre-PR’, submitting their PR for review to my own fork of a project. The expectation is set at the beginning that whatever they are working on is going to be imperfect, but that making, recognizing, and fixing mistakes/assumptions/style choices are a fundamental part of the training. The criticism is still ‘open’ (and can even be included in an eventual PR) but not nearly as visible as submitting to the main repo. By going through this pre-PR process, the contributor only worries about the judgement/criticism from an individual person whom they know has pledged to be supportive. Additionally they get used to the language that’s used in reviews, how to resolve disagreements, etc. Most importantly, when submitting to the main repository, they know that they are not submitting it alone. Any and all criticism does not just fall on their shoulders and the responsibility for resolving it lies on both people.
Within the context of office hours, I wonder if they could also incorporate something analogous to ‘Discussion sections’ for large lectures: smaller groups that meet regularly led by more junior members of the community. Smaller groups mean a lower barrier to entry for participation and better sense of community. Spreading the work to junior members makes the gradient in expertise less daunting and allows individuals to choose with whom they feel more comfortable working.
@dcherian: I love this idea, and I want to see this go forward. I can pledge the commitment from my team here at NCAR to help make this work, and there are also some in my team who will definitely benefit from these office hours themselves!
There is a lot on the table here. Pair-programming sessions, Sprints, and all 5 parts of the Office Hours schedule that @dcherian originally proposed. I personally think it is absolutely all worth it. Perhaps we should consider alternating different kinds of Office Hours, such as a “Getting to Know Us” kind, then 2 weeks later have a “Usage Questions” kind, then a “Pair Programming” kind, then maybe a Sprint, and then back to “Getting to Know Us”…or some balanced mix of them. If we can announce the schedule well in advance (at least a couple of meetings out), then I think people won’t be overwhelmed.
Thanks @dcherian for pinging me on this. I think the office hours could be interesting and successful, since they would not require a significant time commitment from new participants. I’d also echo that sprints are great; success of sprints, though, often seem to hinge on having a good set of projects for beginners. Outside of my own start in open source, I’ve not had a lot of success with people just “scratching their own itch” in my domain-specific projects. You also want to make sure the projects have the bandwidth to mentor the projects, or at the very least be able to review/merge any contributions within a reasonable timeframe. Nothing is more discouraging than to put work in and have it sit there.
It’s also been my experience that a lot of effort needs to go into continually fighting “imposter syndrome”. So it’s not enough to make sure the project is open and welcoming, but you need to make it clear that if you can write code, you’re completely capable of contributing to the projects. There’s this mystique of “real programmers” that seems to keep people from feeling like they’re good enough–a mystique that needs to be burned to the ground IMO.
Thanks for the invitation Deepak, I’m not much of a contributor to this platform, at least at yet. Thus my suggestions are somewhat limited to the guiding principle than specifics of the doings given my limited knowledge of exactly how Pangeo is organized. First of all, this is an excellent idea. Concrete plans of this nature are essential steps to the increasing diversity of contributions and ideas. Suggestions you are putting forwards are a good start as everyone reiterated. From where I’m where sitting, understanding bottlenecks remain a challenge; I’m still unsure where the conveyer belt elimination problem is exactly. Since this conversation started, I noticed that there two issues of which isolating them might help scale what is possible and set realistic goals per target. Namely, i) inclusion, creating an environment that welcomes everyone (this border identity politics for lack of a better word), ii) lack of diverse contribution (economic, actualization, or mobility exclusion); this is particular to African Americans and other minority groups. For the former, much is an essential part of this conversation; I had not found a consistent way to think about it that is not self-conflicting, and hence I will leave this to other people’s contributions. I do fine with one of one, but I strangle with a broader structure.
For the latter, I believe that clear and actional targets can be set, which may also help the former. The main challenge here is the “pool,” my experience at Ocean Sciences was that minority groups are really a minority; it’s a broad issue. Thus, suggestions to deepen the entrainment depth is a good one. Partnering with SOARS and other related programs is a good start. I know SOARS is almost always looking/welcoming new mentors and one-time lunch invitations to interact with students. Perhaps offering to give a Pangeo talk during summer visits might be an excellent bait; introduce Pangeo and the support available. Success rates may be as low as two or three students per year initially, but being consistent might be beneficial over time. As new/diverse contributors come in, so will be an improved understanding of bottlenecks, which will also strengthen the approach. Finally, I think @ashao suggestions on mentorship i.e., support for the “activation energy” as new contributors come in, is an excellent one, linking new contributors to mentors. I know there is a time and resource cost to all this, hope you find a way to balance these with your other objectives well.
Thanks for putting this together @dcherian . I wanted to clarify my comment from today’s developer meeting. For those who were not there, I suggested we poll the Pangeo community (because I do not have a sense of the diversity outside of the people I work with most closely) and see if there is any internal efforts that can be done to lift up underrepresented scientists we are already connected to. This is hard to say but I think it is important to (because addressing the culture of our community is an important piece of making the community more diverse)- I felt that my idea was shut down without group discussion and the conversation moved on before I had a chance to respond, which is poignant given I was the only woman in this video conference discussing diversity.
There are scientists who are connected to Pangeo but not active contributors. A poll could be useful to get metrics on and identify underrepresented scientists who would like to contribute more, as well as creating a channel for our minority scientists to provide their ideas on how to make contribution a more tangible goal (which I think the office hours and a lot of the ideas already in this thread will be incredibly successful at!). Diversity is not enough if we are not supporting the diverse BIPOC scientists within our community. I know the results of the poll would not be entirely surprising- we already know that the community is overwhelmingly white. This idea was geared toward identifying and giving a voice to the underrepresented scientists that would like to contribute more. We do not have to move forward with this idea, it was just an idea for a starting place and I love a lot of the ideas I see posted in this thread already. I am posting this to clarify that when I suggested polling people on their proximity to any underrepresented group to get an intersectional picture of Pangeo, I was not in anyway suggesting that we do not center black scientists in our diversity efforts moving forward.
I love the internship ideas and would love to celebrate BIPOC scientists who are established in their career as well (perhaps inviting them to present research during meetings). Of course we need to be careful that our diversity efforts are not creating more labor for black scientists. I want to acknowledge that many universities and academic institutions confuse diversity with racial justice (hiring or enrolling more diverse students without addressing the prejudices or underlying factors affecting the experience of those professors and students on campus), and we need to be cognizant of that in our efforts to support the black scientists within the field.
Thanks for sharing this Julia. I apologize sincerely for the part I played in shutting down your idea, which, as expressed eloquently in your post, is truly excellent and valuable. The irony you point out–that I reproduced a well-known sexist pattern of downplaying a woman’s viewpoint in a discussion about diversity–is devastating. I won’t try to make excuses, but I will try to do better in the future.
@jukent, Thanks for posting! And thank you, sincerely, for calling us out on, as @rabernat puts it, a well-known sexist pattern. As your supervisor and mentor, I feel especially terrible, but @rabernat is right: no excuses. I’ll just promise to try to do better in the future, too.
I think that the idea of a poll is a very good one. There is a wealth of useful information that we can get from a good poll. I don’t trust myself to put a good poll together, but I know we can find help on that front. So, as we decided with the Office Hours during the developer meeting today, I think we should just move ahead with the idea of putting the poll together.
@jukent I am truly and deeply sorry for not recognizing that your opinion was shut down. I will try and do better in the future.
I agree with doing a poll. In fact, wasn’t there a pangeo survey of some sort last year. Maybe it is time for a 2020 survey with an augmented diversity-metrics section?
Thanks @rabernat @kmpaul @dcherian I wish I was able to speak up in the meeting, but it definitely took some reflection to understand what I was feeling and where it was coming from. It was still hard to speak up after the fact because I did not know how receptive anyone would be to my comment. I was very relieved that your responses were authentic and considerate. Conversations about diversity are never easy, and I think it is important that we can be vulnerable during them (highlighting to the difference between a brave space and a safe space). I hope that me speaking up and your genuine responses make it easier for other underrepresented scientists to speak up if and when they have experiences they would like to share.
@dcherian I think expanding the 2020 survey is the ideal way to implement this. At @kmpaul’s suggestion, we are planning on meeting with NCAR’s CODE team to discuss diversity in internship and ways to implement proper surveys. Though, this will have to wait until August, when our current summer internships end, to really move forward with this particular aspect of the discussion.
I’m really happy to see this conversation happening and everyone’s great ideas. Thanks @jukent and @dcherian for getting this going. We might want to think about how to approach the different communities that Pangeo brings together: devs, education, science, data. There are a lot of people who are probably interested in being part of the Pangeo community who might never make a PR. The Pangeo community is really friendly and open, but can still be intimidating to try and join.
Thanks to everyone for this thoughtful and insightful discussion and to @dcherian for pinging me to join in. In an effort to not simply repeat many of the excellent points already made, instead I’d like to share some personal experience and reflections about my own involvement in Pangeo that support them.
I first learned about Pangeo through it’s use at the 2019 ICESat-2 Cryospheric-themed Hackweek. Having participated in an earlier Geohackweek where we had to get everything running on our local machines (using some combination of conda, docker images, and lots of googling), I was immediately impressed with all that Pangeo brought to the table. Upon leading the charge for development of icepyx (see #science:icesat-2) after the ICESat-2 Hackweek, I was welcomed into the Pangeo community, invited to participate in events and share my expertise at Pangeo workshops. Yet despite this warm welcome and these invitations, I still have not actively contributed to the code base that Pangeo relies on. Some of this comes down to time, but as many others have pointed out, it’s also due to not knowing what or where I might be able to contribute (do we have a list of suggested “first time contributor” projects, or would I need to look in the issues tab for each of Pangeo’s repos?). Although I have been writing code for a decade, most of my “open-source developer skills” and knowledge are self taught and have been acquired in the last year (as has most of my actual code sharing on GitHub).
The Pangeo community consists of a wonderful mix of scientists, users, developers, and scientist-developers, but as with any boundary work (a social science term used to broadly describe any work that is being done at the “boundaries” between two disciplines, e.g. a local fisherman’s collective working with policymakers to craft catch laws or historians working with climate scientists to better understand impacts of climate change), there are generally a few basic disciplinary terms and habits that are so “normal” in our fields we no longer notice them (for instance, iteration on PRs as mentioned earlier). Even as a lead developer, I have found myself scared to reject or ask for changes to PRs for fear that people won’t try to contribute again, and from this thread have learned that my own perception of “all PRs as good” is only a perceived reality of what really happens during development for more established libraries.
I think that office hours - with targeted, pre-announced topics (and a clear indication that anyone wanting to join or seek help is welcome), combined with the mentorship, partnered-programming, and an obvious, welcoming list of small projects that newcomers could attempt in an afternoon are all going to be crucial to expanding the number of contributors. Specifically inviting BIPOC to participate in these events through the mentioned programs will allow us to improve the diversity of the Pangeo community, both from within and without.
Just want to say thank you all for this excellent discussion. I appreciate everything that people have shared. It’s not an easy conversation.
I think at the core is a central question that we have wrestled with since the beginning:
What does it mean to be “part of” Pangeo? And do we want more people to be part of Pangeo?
Pangeo in its current form has gotten very hung up on operating cloud infrastructure (at least as measured by the weekly checkin meeting agenda), and I’m not sure this is a good thing. It’s quite narrow compared to the original goals of the project.
One model is that a successful Pangeo may be largely invisible to most people, working behind the scenes to development between different packages and advance science use cases. In that case, our goal may not be to recruit more people “into Pangeo” but rather funnel them towards other projects (e.g. Xarray, Jupyter, etc.)
Another model is that Pangeo orient itself more towards scientists and less towards developers, becoming a place for scientists to get help on applying open-source tools to their problem of choice. In this case, we would look to cultivate more dialog and more interaction.
Thanks for the thoughtful replies everyone.
Unfortunately, things have come up and I will be busy for the next 2-3 weeks. I will come back to this in latter half of August.
PS: There are some related ideas here: https://github.com/dask/community/issues/75