Lead Data Software Engineer

EPAM·Argentina, Colombia, Chile·Удалённо·5д. назад

We are seeking a Lead Data Software Engineer to come on board with our team.

This position centers on developing and maintaining data infrastructure that fuels AI-powered products and intelligent agent systems. You'll get the chance to engage with state-of-the-art technologies and help shape scalable, dependable platforms within a cooperative setting.

Responsibilities

Plan, build, and support data ingestion and processing pipelines that supply RAG systems, covering the management of unstructured data, images, videos, metadata, and permissions
Oversee and fine-tune vector database infrastructure, such as Amazon Kendra alongside an active migration toward OpenSearch
Build evaluation datasets and performance measurement frameworks tailored to agents
Establish monitoring and observability pipelines for AI workloads, including dashboards for latency, quality, and cost
Roll out data governance, privacy guardrails, and quality controls for AI training and inference data
Back A/B testing and experimentation infrastructure used to evaluate agent iterations
Work jointly with Backend AI engineers on data schemas and embedding approaches

Requirements

At least 5 years of data engineering background, including direct work with AI/ML data infrastructure
A minimum of one year guiding and managing development teams
Solid Python expertise for crafting data pipelines, ETL workflows, and backend automation scripts
Practical production experience with vector databases, covering schema design and index management for Amazon Kendra or OpenSearch
Thorough grasp of search and retrieval concepts, including embedding models, chunking techniques, and retrieval optimization
Working familiarity with AWS services like S3, Glue, Athena, and Kinesis (or equivalents), as well as Docker and distributed data environments
Experience treating data quality practices such as monitoring, validation, and lineage tracking as operational standards
Background in defining AI/ML evaluation metrics and setting up systematic tracking using evaluation frameworks
English language proficiency in writing and speaking at B2+ level or higher

Nice to have

Exposure to LangSmith, RAGAS, or custom-built evaluation framework approaches
Experience with multi-modal data processing involving unstructured text, images, and videos, together with related governance
Hands-on participation in LLM fine-tuning data preparation
Familiarity with observability tools tightly integrated with AI calls, such as Langfuse or Arize
Background in constructing streaming data pipelines with technologies like Kafka or Kinesis