We are looking for an experienced engineer with strong Linux and system-level expertise who can operate autonomously in complex production environments. You must be able to independently troubleshoot incidents, lead and support post-incident service recovery, and drive improvements to overall system stability, performance, and observability. We are looking for a hands-on Site Reliability Engineer (SRE) with a strong background in Linux infrastructure and third-party system operations.
This role focuses on managing and optimizing large-scale environments (5,000+ hosts) running technologies like Kafka, Redis, and Kubernetes.
The position does not involve application development but requires deep operational expertise and solid troubleshooting skills.
Qualifications
5+ years of experience in Linux system administration or SRE roles
Proven experience managing large-scale infrastructure environments
Strong troubleshooting and performance tuning skills at the infrastructure level
Basic scripting/automation experience (Bash, Python)
Familiarity with IaC tools (e.g., Ansible, Puppet)
Knowledge of distributed systems and container orchestration (Kafka, Kubernetes, etc.)
Excellent communication and problem-solving skills
Location
This role is based in the UK - our office is in Soho, London
On-call rotation: one week every 4–5 weeks (24x7 coverage).
Regular maintenance outside of business hours is generally not expected.
Please note: We are unable to offer Visa sponsorship at this time
Why join us?
Own meaningful projects that directly impact millions of Bumble users.
Learn and grow in a high-performing engineering team committed to mentorship and learning.
Be part of a culture that values respect, excellence, curiosity, courage and joy.
Enjoy competitive compensation, equity, and world-class benefits.