We are seeking a Junior Site Reliability Engineer (Python) to join a dynamic Site Reliability Engineering team responsible for managing cluster infrastructure for one of the largest compute infrastructure providers worldwide.
This role offers a unique opportunity to contribute to cutting-edge systems and learn best practices in distributed systems and platform orchestration.
Responsibilities
- Gain proficiency in project-specific tools exclusively available within the customer environment
- Investigate existing configurations of tools and services used in the management platform
- Write unit, integration, and functional tests for code changes as applicable
- Collaborate with the SRE team to understand platform architecture, mechanisms, and interfaces to live systems
- Maintain and refactor internal services supporting the cluster management platform
- Participate in migrating legacy deployment and monitoring systems to modern tools and services
- Prepare technical documentation related to code changes and platform architecture
- Communicate with multiple teams across the organization to align on tasks and processes
Requirements
- 0-2 years of system engineering or DevOps experience with competency in Linux basics and operations
- Background in Bash scripting and basic networking principles
- Knowledge of software testing principles acquired through academic or professional experience
- English proficiency at a minimum B2 level
- Flexibility to work autonomously on tasks defined with high-level objectives
- Capability to engage effectively with multiple teams across an organization
Nice to have
- Familiarity with Python or Go programming languages
- Understanding of GCP, especially in IAM management and networking, or other public cloud platforms