We are searching for a Lead DevOps Engineer to back service health monitoring, deployment promotions, and issue resolution across CI/CD and Kubernetes ecosystems.
This position centers on overseeing applications, running Azure DevOps release pipelines, looking into deployment problems, and ensuring seamless rollouts in both production and non-production environments.
Responsibilities
- Track application and service health through ArgoCD and Grafana
- Run and oversee service promotions via Azure DevOps pipelines
- Verify deployments and assist with rollback or recovery efforts whenever necessary
- Look into pipeline, Kubernetes, and deployment-related breakdowns
- Diagnose problems using utilities such as K9s, kubectl, ArgoCD, and through log and event examination
- Coordinate with development, QA, and release groups throughout deployment activities
- Communicate deployment outcomes, issues, risks, and resolution updates to stakeholders
- Keep release documentation and operational tracking up to date
Requirements
- A minimum of 5 years of relevant professional background
- At least one year of experience guiding and overseeing development teams
- Practical exposure to Azure DevOps CI/CD pipelines and deployment-related troubleshooting
- Strong proficiency with ArgoCD and GitOps-driven deployments
- Real-world Kubernetes experience, covering pods, deployments, logs, events, and rollouts
- Hands-on knowledge of kubectl, Grafana, and additional monitoring tools
- Reliable understanding of Helm charts, Kubernetes manifests, and Git-based workflows
- Strong investigative and analytical capabilities for resolving intricate issues
- Capacity to perform under pressure during rollouts or incidents
- Attention to detail combined with a strong ownership attitude
- Proactive approach to problem-solving and dedication to continuous improvement
- Excellent verbal and written English communication skills (B2+ level or above)
Nice to have
- Strong communication and teamwork abilities for collaborating with cross-functional groups
- Practical use of Docker for containerized application development
- Awareness of Prometheus for monitoring and alerting purposes
- Exposure to Azure or AWS cloud-based environments
- Foundational scripting abilities in Bash or PowerShell for automation needs
- Familiarity with production support, incident handling, and rollback procedures
- Understanding of networking principles, secrets management, and microservice-oriented environments