Hi all. Thanks for being so welcoming at the community meeting yesterday! It was really fun to attend and see how similar the issues are to the projects I have worked on.
I’ve got a lot of thoughts and I tend to write too much (sorry). So I’m going to break this up into sections.
Why I came here
As I said at the meeting, I have been thinking about the best way to help users of some of neuro-specific packages get on board with Python and the data science ecosystem. I run workshops on the package that I’ve helped maintain, and this has been the biggest request: help me learn Python/the basic data science stack.
Your presentation at SciPy last year was great because it seems a natural fit (I have really struggled with how to help users learn matplotlib, numpy, and especially pandas).
I got here partly because I previously tried to roll my own Python class (GitHub - EricThomson/practical_python: Learn just enough Python to start analyzing and visualizing data.) based mostly on other peoples resources, but it was largely a failure for a host of reasons I’d be happy to share.
Pythia foundations for neuro?
I have been thinking ever since the Scipy presentation that it would be great to reshape Pythia Foundations for neuroscientists. I have mostly thought about what I would want to cut (sorry ) – cartopy, datetime, xarray (which is great but not that big in neuro yet). Neuro would probably also want to revamp the data formats section, as ours are idiosyncratic. I’d probably want to add scikit-learn and maybe scikit-image.
I’d probably want to add more on image loading/viewing and especially image stacks (especially tiff stacks – tiff is a terrible but ubiquitous format in neuro), and tools for viewing them (either napari or fastplotlib or others).
At any rate, I haven’t thought through the specifics of what it would look like too much, for a couple of reasons. First, I mainly was looking to get your thoughts (and I appreciate the openness to usage of the material – obviously if I were ever to adopt the material it would be with grateful and transparent attribution to the source). Second, I wouldn’t want to do this alone I would want to talk to other neuroscientists first and try to build some support because one pitfall I want to avoid is doing it alone.
Pythia foundations: generic version?
I’ve also thought about other options, like could there be a “generic” Foundations template that didn’t have anything specific to any science, but would be generically useful to RSE types trying to make their way in the Python ecosystem? This could either just be used straight-up, or be adapted by specific groups.
I have been thinking a lot about how to help onboard people into data science/ML/AI more generally lately, especially people underrepresented in tech and the “data” sciences.
I am from Durham NC which is very diverse, but walking into work, or going to tech meetings, Scipy, etc. it’s like walking through a diversity flattening filter, and this really bothers me and I’ve been thinking a lot about what we might do as a community about this and how we might maximize usage of great educational materials. I wonder if, in this spirit, we could work on a minimal kernel of the Pythia Foundations repo that specific scientific disciplines could then use or adapt for their own needs: e.g., someone interested in public health, bioinformatics, physics, and they could take it and adapt the “generic” Foundations course for their particular needs?
E.g., most disciplines don’t need cartopy, or a heavy emphasis on datetime, or (arguably) xarray (this hurts me to say b/c I love xarray+dask).
I see the core trio as numpy, matplotlib, and pandas (and as you note in the repo, we all realize scipy is important, but I think we just learn what we need from osmosis, and that is how you handle it in Pythia, which seems right).
I might pitch adding scikit-learn to that triad (though maybe I’m just being biased because I’m from neuroscience so feel free to shoot me down here , but … ).
tl;dr
Project Pythia is amazing. I’d love to see it adapted for other disciplines, but maybe the right way to do that is to create a core generic version with the geo bits cut out so that individual disciplines can modularize and add in what they need where they need it.
Please take this all in the spirit of brainstorming and trying to figure out the best way for others to take advantage of the great work you are doing, and use your work as a model. I’d be very curious to hear what you think!