Responsibilities:
Manage and troubleshoot complex distributed large-scale software systems
Build scalable, secure and reliable container-based infrastructure
Automate software delivery processes with CI/CD pipelines
Collaborate within DevOps, development, security, compliance teams
Ensure the security and stability of the infrastructure
Active participation in building new infra pieces like ClickHouse, SaaS
Improvement and strengthening DRP
On-going DevOps activities in support of development/deployment/testing processes
Requirements:
3+ years of experience in DevOps, SRE, or data platform/infrastructure related field
Experience of operating stateful/distributed systems in production, good understanding of systems fundamentals (networking, storage, scaling)
Hands-on experience with cloud platforms (Azure, AWS, or GCP) and Kubernetes
Experience with Infrastructure as Code (e.g., Terraform)
Familiarity with modern deployment practices (e.g., GitOps with ArgoCD/Flux)
Development and scripting skills (Python, Bash)
Expertise in any observability solutions (Prometheus stack, etc)
Linux system administration expertise
Understanding of DevOps security principles
Data engineering skills
Data streaming skills
Troubleshooting skills
Good communication skills
Proficiency in English and Russian, written and verbal
Nice to have:
ClickHouse or any relative technology
Kafka or any relative technology