Network Optimization: Implement high-performance networking solutions, including private dedicated connectivity and low-latency HPC interconnects (e.g., AWS Direct Connect / Azure ExpressRoute / Google Cloud Interconnect; AWS EFA / Azure HPC fabrics e.g. IB/ GCP HPC interconnect options etc.), to support low-latency MPI applications
Hybrid Architecture Design: Architect and deploy scalable HPC clusters using cloud-native HPC cluster managers and virtual compute (e.g., AWS ParallelCluster + Amazon EC2 / Azure CycleCloud + Azure VM Scale Sets / Google Cloud Cluster Toolkit or similar + Compute Engine), ensuring seamless integration with on-premises schedulers. Proven expertise with HPC schedulers and orchestration patterns (e.g. Slurm, cloud HPC cluster orchestration, batch orchestration)
Infrastructure Portability: Define and maintain containerization standards using Apptainer (Singularity) or related technologies to ensure binary compatibility across heterogeneous hardware and environments
Design high I/O workloads (parallel file system patterns) (e.g., AWS / GCP / Azure Managed Lustre or parallel FS deployments and/or self-managed Lustre/GPFS/BeeGFS), observability, reproducibility and operational excellence
Requirements
Knowledge of HPC-with-cloud integration strategies, experience with technical feasibility planning
Strong understanding of regulated enterprise R&D environments
Understanding of scientific computing reproducibility requirements, relying on enterprise standard tooling
Deep understanding of Amazon VPC /Google Cloud / Azure networking, peering and security group configurations
Nice to have
Experience in life-sciences
Experience in specific applications like Schrodinger, Jupyter, Matlab
Experience with workflow engines
We offer
Competitive compensation
Remote or office work
Flexible working hours
Healthcare benefits: medical insurance and paid sick leave
Continuous education, mentoring, and professional development programs