Manage HPC cluster configuration and Linux based computation servers, as well as provide performance tuning.
Assist in resolving performance issues on the cluster, help with designing HPC jobs.
Install, configure, and manage Linux systems throughout the environment using HPC-specific software.
Apply consistent security configuration standards across the Linux infrastructure
Implement and maintain management and monitoring tools.
Develop opportunities to streamline and automate deployment and configuration tasks.
Provide technical support and guidance for Linux deployments.
Build from source, install, configure, and manage linux applications on the cluster.
Requirements
Experience in supporting production RHEL-like Linux servers and applications (5+ years), general knowledge of common open-source applications, such as NGINX, PostgreSQL, MariaDB, and Git
General understanding of services like DHCP/DNS/NTP
Experience with Ansible or similar configuration-as-a-code software: 3+ years
Comfortable with configuring OS images from scratch
Comfortable with building software from the source code, understanding the concept of dependencies, libraries, and linking
Background in remote management of large on-prem infrastructure (multiple servers, switches, storages, etc.)
Excellent communication and interpersonal skills
Performance optimization skills in Linux
Excellent communication skills with the English proficiency of at least B2+
Nice to have
AWS: hands-on experience with production environment