EPAM Vietnam is seeking a Senior Machine Learning Engineer who will join our growing engineering team and work on cutting-edge ML solutions that impact users globally. You'll be at the forefront of building intelligent recommendation systems and production-scale machine learning infrastructure, while collaborating with talented engineers across international projects.
This is a high-impact role where you'll own the entire ML lifecycle, from architecting data pipelines to deploying models at scale. Beyond building exceptional technology, you'll have the chance to shape our technical direction and grow the next generation of ML engineers through mentorship.
Responsibilities
- Design and build production-grade machine learning systems with a specialization in recommendation engines that serve millions of users
- Develop and optimize high-performance data and model pipelines using Spark/PySpark to process massive datasets efficiently
- Build Flask-based REST APIs that reliably serve models in production, ensuring low latency and high availability
- Monitor model performance in real-world conditions and implement data-driven optimizations to enhance accuracy and efficiency
- Identify and execute performance improvements across code, databases, and compute infrastructure
- Work with cross-functional teams including product, data science, and platform engineering to deliver integrated solutions
- Contribute to sprint planning, technical design reviews, and architectural decisions that move the team forward
- Provide hands-on guidance and technical mentorship to 1–3 engineers, fostering their growth and development
- Design and deploy Large Language Model applications leveraging vector databases (Pinecone, Faiss, PgVector) for intelligent search and retrieval
Requirements
- Advanced proficiency in Python with a deep understanding of best practices for production ML code
- Proven experience building scalable solutions with Spark/PySpark and handling large-scale data processing challenges
- Solid grasp of machine learning principles, model development, and the complete ML lifecycle from experimentation to production
- Hands-on experience building RESTful services with Flask or similar frameworks
- Working knowledge of cloud platforms (Azure, AWS, or GCP) and cloud-native architectures
- Strong software engineering practices, including Docker containerization, Git version control, and CI/CD automation
- Demonstrated ability to tackle complex technical challenges, take ownership of outcomes, and thrive in collaborative team environments
- Proficient in spoken and written English