my concern is users becoming overly dependent on proprietary solutions
I, and I’m sure many others, share this concern. However in order to discuss this productively I think we need to make a few distinctions about what we mean by “users” and “solutions” in this context:
Software vs Infrastructure Solutions:
The “core” of the Pangeo software stack is open-source in the sense of “Wide Open”. In particular Xarray and Zarr are multi-stakeholder, community-governed, and not managed by any one private company. Same for Jupyter/IPython/Numpy/SciPy/Pandas AFAIK.
But these are all exclusively software projects - using them and maintaining them takes people-hours, but those costs aren’t automatically incurred, they don’t grow linearly with the number of users, and there is rarely true urgency requiring a maintainer to be “on-call” if something breaks. Infrastructure isn’t like that - somebody eventually has to pay for the hardware compute/storage costs, those costs scale with number of users, and if something breaks it’s an urgent problem, not one that can wait until the next release. This distinction is important to keep in mind because it’s unlikely that for example the organising model that xarray uses would work for providing reliable infrastructure solutions.
Pangeo has multiple groups of users, who have different needs
Pangeo software tools (but not necessarily the infrastructure) are used today by:
(1) university academics,
(2) state-funded researchers (e.g. at National Labs),
(3) individual users globally (e.g. through outreach initiatives or just in the wild) and
(4) private companies / non-profit corporations.
All of these groups have different needs, different priorities, different politics, different funding situations, and different preferred relationships with the people managing their infrastructure. It is unlikely that one solution will fit all, and I agree we should push back against the idea of one solution for all users. Luckily that’s okay, because so long as the core pangeo stack is FOSS, it can be used by multiple infrastructure providers.
How do we envision providing unbiased information?
This would require care, but I don’t think is impossible. Clearly if one person from one company who is selling one solution wrote the recommendations, it would be a problem. But Pangeo already has many people from different constituencies who would be qualified to weigh in, a steering commitee and (now) an independent legal structure.
There is space for many infrastructure solutions
There are multiple possible models for infrastructure solutions. Matt (and Ryan, who is the source of the original comment you quoted Brianna) are proposing one model, which IIUC essentially uses capital from private companies to fund the development and maintenance of infrastructure that could perhaps also serve some people in the other 3 groups. There is clearly a market and a place for this model. However “We” in the rest of the community are still free to reject this model for any reason.
What other models are possible? I can think of: federally-organised / philanthropically-funded / volunteer-driven as broad categories. I’m personally interested in thinking about which of these models can best serve the wider global scientific community, with diversity and inclusion as an explicit aim. We have some nascent discussion of that question in the What's Next - Democratization - Science/Community thread that Naomi linked.
Specifically what are we afraid of?
It also helps to distinguish what scenario we are concerned about in order to avoid it, rather than just Private == Bad. Are we concerned about vendor lock-in? Losing access to our data? Biased driving of software development priorities? Non-democratic governance? Anti-competitive incentives stifling further innovation? If so I would want to openly discuss these concerns specifically, so choices can be made between offerings depending on priorities, or to ameliorate these risks. (EDIT: This last point is really another way of saying the same thing that Yuvi’s blog post said.)