We are looking for an experienced SRE to join us. This is a remote position and requires people to participate in on-call support rotations as needed.
Essential functions
- Develop a deep understanding of the traffic proxy/API Gateway platform, including its architecture, configurations, and dependencies.
- Collaborate with SREs and QA teams to investigate and resolve production and test environment issues.
- Participate in on-call support rotations as needed to ensure service reliability.
- Manage and maintain CI/CD pipelines, ensuring smooth and reliable release cycles.
- Own and execute release operations, including validation, rollout, and rollback procedures.
- Address customer questions and internal requests related to traffic behavior, configuration, or service interactions.
- Implement bug fixes, security patches, and compliance updates in a timely and safe manner.
- Proactively monitor and improve service performance, scalability, and efficiency.
- Contribute to new feature development and continuous improvements to enhance the platform’s robustness and developer experience.
- Maintain and improve observability, ensuring issues are quickly detected and diagnosed.
Qualifications
- Strong programming skills in Java or C++, with solid software engineering fundamentals.
- Deep understanding of distributed systems and high-performance infrastructure services.
- Experience operating or developing systems that process large-scale traffic (millions of RPS).
- Familiarity with Nginx or similar proxy and load balancer technologies.
- Working knowledge of networking concepts (TCP/IP, HTTP, TLS, DNS, routing).
- Experience with CI/CD pipelines, automated testing, and release processes.
- Familiarity with Linux systems, shell scripting, and containerized environments (e.g., Docker, Kubernetes).
- Experience with monitoring and observability tools (Prometheus, Grafana, Splunk, etc.) is a plus.
- Understanding of security and compliance practices for production services.
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent work experience.
Professional Experience
- 10+ years of software engineering experience, preferably in infrastructure or platform engineering.
- Prior experience supporting mission-critical production systems with strong uptime and performance requirements.
- Proven ability to debug complex, distributed systems and identify performance bottlenecks.
Soft Skills
- Self-motivated and capable of working independently with minimal guidance.
- Fast learner who thrives in large, complex system environments.
- Strong ownership mindset — takes initiative, drives issues to closure, and upholds operational excellence.
- Effective communicator and collaborator across engineering, SRE, and QA teams.
- Passionate about system reliability, scalability, and performance.
We offer
- Opportunity to work on cutting-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, vision, dental, etc.
- Corporate social events
- Professional development opportunities
- Well-equipped office
About us
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI,
and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical
challenges and enable positive business outcomes for enterprise companies undergoing business transformation.
A key differentiator for Grid Dynamics is our 8 years of experience and leadership in
enterprise AI, supported by profound expertise and ongoing investment in
data,
analytics,
cloud & DevOps,
application modernization
and
customer experience.
Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.