HPC Senior Systems / Security Administrator at NOAA

RedLine is looking for a High Performance Computing (HPC) Senior Systems Administrator to join our team. This position will work on the National Oceanic and Atmospheric Administration (NOAA)'s Weather and Climate Operational Supercomputing System II (WCOSS II). The HPC Senior Systems Administrator will be an experienced individual with a strong security, Linux, HPC, configuration management, systems automation and networking background.

This position is in direct support of the National Weather Service’s mission to provide weather, water, and climate data, forecasts and warnings for the protection of life and property and enhancement of the national economy. The size and scope include two supercomputing sites, one of which is always operational and highly visible, and mission-critical timely delivery of tens of thousands of weather products for external consumption 24x7x365.

US citizenship and the ability to obtain a Public Trust clearance is a requirement to apply. This is a remote position with the possibility of some minimal travel. This full-time position offers a full benefits package including paid time off, 401k match, and health care benefits.

We require all newly hired employees to be fully vaccinated before their start date. In some circumstances, the company may provide reasonable accommodations for employees as required by applicable law.

Job Responsibilities:

  • Leads the efforts to implement best practices in systems security to the WCOSS II supercomputers including the following:
  • Reviews and updates security plans as well as review system changes (RFCs) for security compliance
  • Reviews periodic security scans for compliance and make recommendations for system software updates
  • Consults as needed/requested on security matters for the WCOSS II supercomputer.
  • Work with systems staff to enhance configuration management infrastructure and make recommendations that security best practices are adhered to.
  • Evaluate performance impacts of planned operating system changes
  • Update and expand existing systems monitoring capabilities
  • Develop automation tools for cluster administration
  • Participate in resource optimization and job scheduling software and policies
  • Provide technical support to researchers using HPC resources, troubleshoot problems and develop appropriate computational strategies
  • Consult and collaborate with scientist coworkers to determine best system configurations for applications.

Other Requirements:

  • Minimum of 7 years RedHat or CentOS Linux system administrator experience in an HPC environment.
  • Experience with batch systems such as SLURM, PBS, or LSF
  • Experience managing parallel and cluster file systems such as GPFS or Lustre
  • Network management experience, including in an HPC context (e.g., InfiniBand, OmniPath)
  • Demonstrated ability to configure, deploy and manage a major system area such as batch system, network, data storage, backup system, database system, or distributed computing
  • Provide leadership and technical expertise to improve HPC cluster performance and resiliency
  • Ability to work both independently and as part of the team; flexibility in dealing with assignments and in working on several projects simultaneously
  • Ability to effectively communicate with people of diverse backgrounds and computer knowledge.

Preferred Skills:

  • Authoring scripts in Python/perl and supporting scripts written by others
  • One of the ISC2 certifications (CISSP, SSCP, etc) or Security+ certification
  • Prior experience with configuration management tools, such as Ansible and/or Puppet
  • Experience integrating applications with cloud provider software stack
  • Experience presenting and/or teaching

To learn more about RedLine please visit us at www.RedLinePerf.com

1 Like