We are seeking an experienced Lead Site Reliability Engineer to spearhead our infrastructure reliability initiatives and guide a team of talented engineers. In this role, you will shape technical strategy, mentor team members and drive operational excellence across our cloud-based platforms and distributed services.
Responsibilities
- Lead the design and evolution of resilient, scalable infrastructure across multiple cloud providers
- Mentor and guide a team of engineers, fostering technical growth and best practices
- Define reliability standards, SLOs and operational policies for production environments
- Architect automation frameworks to streamline deployments and infrastructure management
- Oversee CI/CD strategy and ensure efficient software delivery workflows
- Coordinate incident response efforts and lead post-mortem analyses to prevent recurrence
- Partner with engineering leadership to align reliability goals with business priorities
- Champion observability practices to enhance system visibility and proactive issue detection
- Provide technical direction for microservices and event-driven architecture initiatives
- Evaluate emerging tools and technologies to enhance the reliability ecosystem
- Drive capacity planning, cost optimization and performance tuning across platforms
Requirements
- 5+ years of experience in DevOps or Site Reliability Engineering
- Expertise in AWS, Azure and GCP
- Competency in Kubernetes, Terraform and Ansible
- Skills in GitHub and Jenkins
- Knowledge of microservices, APIs and event-driven processing
- Strong written and verbal English communication skills (B2+)