Senior Data Software Engineer

EPAM·Argentina, Colombia, Chile·Удалённо·5д. назад

We are looking for a Senior Data Software Engineer to join our team.

This role focuses on building and supporting data infrastructure that powers AI-driven products and intelligent agent systems. You'll have the opportunity to work with cutting-edge technologies and contribute to scalable, reliable platforms in a collaborative environment.

Responsibilities

Design, build, and maintain data ingestion and processing pipelines that feed RAG systems, including handling unstructured data, images, videos, metadata, and permissions
Administer and optimize vector database infrastructure, including Amazon Kendra with an ongoing migration to OpenSearch
Create evaluation datasets and performance measurement frameworks for agents
Develop monitoring and observability pipelines for AI workloads, covering latency, quality, and cost dashboards
Implement data governance, privacy safeguards, and quality controls for AI training and inference data
Support A/B testing and experimentation infrastructure for assessing agent iterations
Collaborate with Backend AI engineers on data schemas and embedding strategies

Requirements

A minimum of 3 years of data engineering experience, including direct exposure to AI/ML data infrastructure
Strong Python skills for building data pipelines, ETL processes, and backend automation scripting
Hands-on production experience with vector databases, including schema design and index management for Amazon Kendra or OpenSearch
Deep understanding of search and retrieval concepts, including embedding models, chunking strategies, and retrieval optimization
Practical knowledge of AWS services such as S3, Glue, Athena, and Kinesis (or equivalents), along with Docker and distributed data environments
Experience embedding data quality practices such as monitoring, validation, and lineage tracking as operational defaults
Background in designing AI/ML evaluation metrics and establishing systematic tracking through evaluation frameworks
English language proficiency (written and spoken) at B2+ level or higher

Nice to have

Experience with LangSmith, RAGAS, or custom evaluation framework solutions
Background in multi-modal data processing covering unstructured text, images, and videos, along with associated governance
Hands-on involvement with LLM fine-tuning data preparation
Familiarity with observability tooling deeply integrated with AI calls, such as Langfuse or Arize
Experience building streaming data pipelines using technologies such as Kafka or Kinesis