I saw this call today: https://aws.amazon.com/earth/research-credits/
I was wondering if the Pangeo datasets are easily accessible from AWS, and whether some users from here could benefit from submitting here (maybe take some load off the pangeo cloud resources).
1 Like
Thanks for raising this idea Dhruv.
The problem with these grant programs is that we have no easy way to just plug more credits into the existing Pangeo deployments. If you got one of these grants for yourself, you would basically have to deploy your own Pangeo on your own AWS account. I’m going to guess that you’re not up for that.
I would welcome ideas on how to resolve this structural problem.
@dhruvbalwada, if you got some research credits on AWS, you could try setting up a JupyterHub with Kubernetes using the new open-source Qhub project from Quansight.
Check out the Qhub demo video recorded at JuptyerCon 2020. It’s goal is to allow folks like us to deploy this infrastructure!
My colleague Josef Kellndorfer (a scientist who works with SAR data) got it going on AWS yesterday.
1 Like
Qhub seems amazing. It succeeds where Pangeo has failed in providing a user friendly way to deploy a fully functional hub.
But however you are deploying it, I maintain that it doesn’t make sense organizationally or financially to have an individual user like Dhruv with their own personal private hub. It makes more sense for one hub to serve many users.
What we really need is a way to bill each user’s usage to a different billing account. No one has solved that afaik.
@rabernat, the situation I was imagining was one where someone like @druvbalwada wants to work with a group on a particular project. They spend a few hours writing a 2 page proposal to AWS, get research credits a few days later. Someone in the group spins up the qhub with the environment they want for the project, they create accounts for the users, and off they go. (This is what my colleague Josef is doing).
1 Like
In theory with Kubernetes you could make another worker pool in the users account where your jupyter hub will deploy resources into it when that specific user requests a resource. So basically the user brings their accounts and gives access to the Pangeo hub so that the Pangeo hub can deploy resources into the new account.
In practice this may be a little tricky to set up, but all of the integration points should exist to support it already.
I’ve got a working example for 1-click deploys of Dask & Jupyter notebooks into an AWS Account here https://github.com/zflamig/dask-era5
This lets you quickly get a computing environment in your account with minimal effort on the setup so you can offload compute from Pangeo and instead rely on Pangeo for data access.
1 Like
This is a really interesting idea! If you know of an example config out there @zflamig it would be great to see. @yuvipanda perhaps you’ve experimented with this?