Join EPAM as a Site Reliability Engineer to partner with infrastructure teams, implement security hardening measures and ensure compliance with regulatory requirements and industry standards. In this role, you will design and execute comprehensive application security testing protocols and vulnerability assessment procedures, ensuring full alignment with framework requirements and organizational processes.
Responsibilities
- Maintain comprehensive system architecture with a deep understanding of integration patterns and dependencies across the technology stack
- Design and implement robust monitoring frameworks, intelligent alerting systems and streamlined incident response procedures
- Conduct systematic security reviews, coordinate penetration testing initiatives and perform thorough threat analysis to assess vulnerabilities
- Define meaningful Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and implement automated monitoring to track service reliability
- Execute comprehensive daily operational health assessments and proactively monitor system health through advanced observability tools
- Build system resilience through chaos engineering practices, disaster recovery planning and continuous performance optimization
- Provide expert-level support and rapid incident resolution to maintain production system stability
- Participate in on-call rotations to ensure continuous reliability and operational excellence
Requirements
- Relevant level experience supporting and implementing payments and securities settlement systems
- Strong background in application support, application security, site reliability engineering or solution architecture
- Comprehensive knowledge of security principles, vulnerability assessment tools and penetration testing methodologies
- Hands-on experience designing and supporting Java applications including Server-side, Java / microservices and RESTful APIs
- Strong proficiency in SQL, shell scripting and handling file transfers and connectivity protocols such as REST, SFTP and MQ
- Demonstrated expertise with cloud platforms and containerization technologies like Docker and Kubernetes
- Experience working with RHEL, JBoss EAP, OpenShift, Maven, Oracle Database and fault-tolerant infrastructure concepts
- Track record of analyzing complex business problems and applying design thinking methods to recommend digitalization and process changes
Nice to have
- Familiarity with ICT governance policies and standards within government agencies or highly regulated environments
- Experience utilizing DevOps tools and platforms including Ansible, AutoSys, SonarQube and Claude Code
- Background working with Chaos Testing, IBM MQ, OpenStack and SWIFT financial systems