We are seeking a hands-on Senior DevOps Engineer to strengthen our Kubernetes platform operations and CI/CD ecosystem.
The engineer will actively contribute to scaling cloud-native infrastructure, improving deployment pipelines, enforcing Infrastructure as Code (IaC) standards, and enhancing operational resilience. This is a production-facing role requiring strong troubleshooting capability, ownership mindset, and practical experience operating mission-critical workloads in AWS.
Responsibilities
- Operate, maintain, and optimize Kubernetes clusters supporting production workloads
- Enhance cluster scalability, reliability, and performance through resource management, autoscaling, and workload isolation best practices
- Improve observability by implementing metrics, logging, and tracing for greater operational visibility
- Support onboarding and enablement for multiple teams using the platform
- Design, refactor, and scale GitHub Actions pipelines to improve modularity, maintainability, and governance
- Implement reusable workflows and enforce standards across repositories
- Reduce deployment risk by automating validation and testing processes
- Optimize pipeline performance and cost efficiency
- Implement and maintain Terraform-based infrastructure, strengthening state management, modularity, and version control
- Enforce IaC governance and review processes
- Support environment provisioning and lifecycle management
- Manage and optimize AWS services including networking, IAM, compute, and storage
- Improve secrets management and secure configuration practices
- Contribute to cost optimization initiatives
- Ensure production stability and operational resilience
- Strengthen access controls and secrets handling
- Apply DevOps and SRE principles to production systems
- Participate in incident troubleshooting and root cause analysis
- Drive improvements in system reliability and operational maturity
Requirements
- Minimum 3 years of experience in DevOps roles with a focus on cloud environments
- Strong hands-on expertise with Kubernetes in production settings
- Proven track record managing AWS cloud infrastructure for mission-critical workloads
- Advanced experience with GitHub Actions or other modern CI/CD systems for pipeline automation
- Solid proficiency with Terraform for Infrastructure as Code, including state management and modularity
- Experience implementing and operating cloud-native systems in live production environments
- Excellent troubleshooting and debugging skills for resolving complex issues
- Understanding of DevOps and SRE principles, including production reliability patterns
- Strong English communication skills at B2+ level or higher, both written and spoken
Nice to have
- Experience working in regulated or healthcare environments, understanding compliance and security requirements
- Familiarity with observability tools such as Prometheus and Grafana for monitoring and alerting
- Experience with cost optimization strategies in AWS to improve resource efficiency
- Knowledge of platform engineering best practices and internal developer platforms for enhancing team productivity