Hi everyone,
I am starting to explore cloud-based workflows for working with scientific datasets such as climate and environmental data. I have read through some of the Pangeo documentation and browsed tutorials but I still have a few gaps I want to fill with insights from people who have actually done this in real projects.
Which cloud platforms are the most efficient for handling large datasets?
How do you keep cloud costs under control when using JupyterHub or Dask?
What is the best way to manage shared access when multiple team members are working on the same dataset?
I have started a CCSP Online Training to strengthen my understanding of cloud security, since that is something I want to stay on top of early on.
Any suggestions, best practices or even links to helpful threads would be appreciated.
Thank you.