We are seeking a hands-on Lead DevOps Engineer to strengthen our Kubernetes platform operations and CI/CD ecosystem.
The engineer will actively contribute to scaling cloud-native infrastructure, improving deployment pipelines, enforcing Infrastructure as Code (IaC) standards, and enhancing operational resilience. This is a production-facing role requiring strong troubleshooting capability, ownership mindset, and practical experience operating mission-critical workloads in AWS.
Responsibilities
- Oversee and maintain Kubernetes clusters supporting live workloads
- Improve scalability, reliability, and performance of clusters through effective resource management, autoscaling, and workload isolation
- Enhance system observability by implementing comprehensive metrics, logging, and tracing solutions
- Guide onboarding and platform adoption for multiple teams
- Design, update, and expand GitHub Actions pipelines to promote modularity, maintainability, and governance
- Develop reusable workflows and enforce repository standards
- Minimize deployment risks by automating validation and testing steps
- Increase pipeline efficiency and manage operational costs
- Construct and manage Terraform-based infrastructure with a focus on state management, modularity, and version control
- Maintain IaC governance and oversee review processes
- Support provisioning and lifecycle management of environments
- Manage and optimize AWS services including networking, IAM, compute, and storage resources
- Improve secrets management and secure configuration practices
- Contribute to cloud cost optimization strategies
- Ensure stability and resilience of production systems
- Strengthen access controls and secure handling of sensitive data
- Apply DevOps and SRE practices to production environments
- Participate in troubleshooting incidents and conducting root cause analysis
- Lead initiatives to enhance system reliability and operational excellence
Requirements
- Minimum 5 years of experience in DevOps roles with a focus on cloud infrastructure
- At least one year of experience leading and managing development teams
- Deep hands-on experience with Kubernetes in production settings
- Proven expertise in managing AWS cloud infrastructure for high-stakes workloads
- Advanced skills with GitHub Actions or comparable CI/CD tools for automating pipelines
- Strong proficiency with Terraform for Infrastructure as Code, including state management and modular architecture
- Experience deploying and running cloud-native systems in production environments
- Excellent troubleshooting and debugging skills for resolving complex issues
- Thorough understanding of DevOps and SRE principles, including reliability engineering practices
- Strong English communication skills at B2+ level or higher, both written and spoken
Nice to have
- Experience in regulated or healthcare industries, with familiarity in compliance and security requirements
- Proficiency with observability tools such as Prometheus and Grafana for monitoring and alerting
- Knowledge of cost optimization techniques in AWS for efficient resource management
- Understanding of platform engineering concepts and internal developer platforms to improve team productivity