Advanced Scientific Computing Roadmap for USGS

Pangeans,
The USGS has convened a team to fast-track a “Advanced Scientific Computing Roadmap” for the bureau. Three main topics are:

  • Advanced Scientific Computing Architecture
  • Workforce Development and User Support
  • Scientific Software Development Methods

We will be discussing on-prem, cloud and hybrid architectures for compute, and of course I will be advocating for Pangeo-like open source science infrastructure to support collaborative scientific analysis and visualization needs.

The output will be a roadmap document that will guide investments.

Are folks aware of efforts/outputs from other organizations and agencies we could use to help inform this work?

1 Like

This is very exciting to hear!

Here in the Commonwealth of Virginia, we recently implemented a new strategy for workforce development in this space, which might be of some relevance to you (broader than just ‘geo’, and a heavy focus on cybersecurity, but by all accounts it’s been quite productive in terms of outcomes):

Happy to provide any handshakes if the organizational structure is of interest. Full disclosure - I am a grantee for topics around data poisoning / satellite imagery.

(As an aside, continued thanks for all that you guys do - it makes nearly everything else we do possible).

Hi Rich! I think this NASA Earth Science Data Systems initiative is very relevant, in particular see links to past and upcoming workshops Open Source Science for the Earth System Observatory Mission Science Data Processing Study Workshops | Earthdata

@DanRunfola were you thinking specifically of this approach for workforce development:

“Commonwealth Cyber Initiative (CCI) works with institutions of higher education across Virginia to align and enhance the exceptional pipelines and platforms already in place, and to provide more and better experiential learning opportunities for students.”

That does sound good. We certainly don’t do enough of that in government.

@scottyhq, wow, that is a goldmine. Thank you!

At first I was thinking: “Wow, look at all this great NASA material we could use for our USGS roadmap”.

Then I was thinking: “Maybe USGS should just look to partner with NASA in this endeavor!”.

After all, there is very little that is agency specific in these plans. And NASA’s budget is $20 Billion while USGS is about $1 Billion.

Digital Earth Australia / Geoscience Australia

The report I helped write for the Copernicus Global Land Service could be a useful reference here:

Opening new horizons: How to migrate the Copernicus Global Land Service to a Cloud environment

Cloud computing has completely transformed computing infrastructure over the past decade, opening exciting new possibilities for data distribution and analytics. As earth-observation data continue to grow in volume and complexity, many data providers are considering how best to take advantage of cloud computing. This report assesses how the Copernicus Global Land Service (CGLS) might migrate from its current configuration to a cloud-based infrastructure. We review the existing portfolio of data and services, provide a vision for a future cloud architecture, and consider the challenges, costs, and risks associated with a migration. The primary benefits associated with a migration to cloud are

  • greater transparency and reproducibility of the data processing chains, enhancing trust
  • better scalability to handle the large volume of data we expect from future satellites
  • better integration with downstream processing chains and thus more value delivered to research and industry

While we frame our report in terms of the CGLS, we note that these key points and many of the technical considerations are quite generic and applicable throughout the Copernicus service ecosystem and to any modern data service.

ABERNATHEY Ryan; NETELER Markus; AMICI Alessandro; JACOB Alexander; CHERLET Michael; STROBL Peter

Abernathey, R., Neteler, M., Amici, A., Jacob, A., Cherlet, M. and Strobl, P., Opening new horizons: How to migrate the Copernicus Global Land Service to a Cloud environment, EUR 30554 EN, Publications Office of the European Union, Luxembourg, 2021, ISBN 978-92-76-28406-2, doi:10.2760/668980, JRC122454.

@RichardScottOZ , thanks for the reminder about the Digital Earth/Open Data Cube effort.
Are there specific planning or reference documents you feel would be useful to look at?

@rsignell I sit on the Study Architecture Working Group for Open Source Science for the Earth System Observatory Mission Science Data Processing Study Workshops | Earthdata. As the study title denotes the focus is definitely on guidelines for systems which process raw instrument data, but one of the realizations coming out of our first set of workshops was the increasing demand for user/community defined level 3 products and the need for on-demand pipelines to produce them. This led to a lot of discussion about collaborative environments for defining level 3 processing algorithms. So while I’d say the objectives between this study and the “Advanced Scientific Computing Roadmap” are distinct it would be very helpful to have someone with your deep experience in these areas provide some context on how Pangeo-like environments might fit in. I’m unsure if they’ve finalized the participants for the second round of workshops (or if they are open to non-NASA stakeholders) but would you be interested in participating? The RFI is available here SAM.gov

1 Like

Yes, think some of the FrontierSI things perhaps, will have a look and get back to you. If you mean earlier stage type stuff?

@rabernat, awesome, another goldmine of references. I’m working my way through the report and the resources, but already that 18F Technology Budgeting Handbook is a standout, with great insights and footnote gems like these:

  1. In The Standish Group’s 2014 CHAOS Report, based on a survey of 25,000 software projects, they found that software projects that cost more than $10 million succeed only 8% of the time. Outcomes improve substantially as the dollar value is reduced, peaking at a 70% success rate for projects under $1 million. :leftwards_arrow_with_hook:
  1. In The Standish Group’s 2014 CHAOS Report, based on a survey of 25,000 software projects, they found that software projects’ outcomes get worse as more money is spent. Limiting the spending on each contract segments the project into smaller components, making each component — and the entire project — more likely to succeed. :leftwards_arrow_with_hook:
1 Like

@sharkinsspatial Thanks for letting me know about that effort. I’ve love to participate but feel I have the bandwidth right now due to this USGS project. I’m really glad you are on the team though! And I would certainly be interested in discussing issues that arise there!

@rsignell Yep - CCI is basically a “new” take (i.e., new for us!) on workforce development here in VA. Think of it kind of like the NSF REUs, add-on programs to existing, funded labs that focus on workforce dev. But much more nimble (and generally lower-$) than the REUs I’ve worked with.

Talking to some people Rich, but also ran across this -

1 Like

https://ucbds-infra.github.io/ds-course-infra-guide/jupyterhub/data8.html

  • and comment from DEA:

“Definitely need to build k8s and DevOps capacity in the work force to maintain environments like this, especially on prem
Also a bit of language agnostic approach supporting R, Python, Julia at least
Also multiple cube backends”

1 Like