Lead AI Engineer (Production Agentic & RAG Systems)

EPAM·Kazakhstan, Uzbekistan, Armenia, Georgia·Удалённо·вчера

We are looking for a seasoned Lead AI Engineer who architects, builds, and operates production GenAI platforms – agentic workflows, RAG pipelines, and LLM-backed services with real users and real SLAs – while leading engineers and setting the technical direction across multiple workstreams.

This is an engineering leadership role, not a research role. The bar is reliability, latency, cost, observability, and safe deployment at scale, with end-to-end ownership from architecture through on-call, and accountability for the technical quality and delivery of the team. Typical workloads include enterprise knowledge platforms, conversational analytics, agentic automation, and LLM-augmented data products.

Responsibilities

Own the end-to-end architecture of GenAI platforms across multiple services and teams, defining standards, patterns, and reference implementations
Lead the design of agent orchestration (graph/state, conditional routing, tool calling, memory, checkpointing) in LangGraph / LangChain or equivalent, and set best practices for the team
Architect production RAG end-to-end: chunking, embeddings, vector stores, hybrid retrieval, reranking, caching, and grounded synthesis – and mentor engineers in building it
Drive the design and delivery of Python / FastAPI services – async, SSE streaming, session handling, and structured error contracts – establishing service templates and conventions
Define the observability and evaluation strategy (MLflow, OpenTelemetry, or equivalent) for accuracy, cost, and regression across the platform
Own the deployment platform on Docker + Kubernetes (EKS/AKS/GKE) with CI/CD, test, eval, and canary gates – setting release standards for AI systems
Lead LLM cost engineering strategy – model routing, prompt optimization, caching, token accounting, and build-vs-buy decisions at portfolio level
Establish GenAI safety & governance practices: hallucination control, prompt-injection defense, PII handling, and HITL where required
Partner with data engineering leadership on semantic layers and pipelines (PySpark / SQL where applicable), and align roadmaps across teams
Mentor and grow senior and mid-level engineers through design reviews, pairing, and technical coaching; conduct hiring and technical interviews
Represent engineering in conversations with clients, product, and executive stakeholders; translate business goals into technical strategy and delivery plans

Requirements

6+ years in software engineering, with 3+ years shipping production LLM / agentic systems (not POCs or research)
1+ years of experience leading engineers or technical workstreams
Proven track record of owning architecture for multi-service GenAI or distributed systems in production
Expert-level proficiency in Python and FastAPI (async, REST, SSE)
Deep production expertise in LangChain and LangGraph (or equivalent serious production experience with LlamaIndex, AutoGen, or MCP stacks)
Strong background in production RAG: embeddings, chunking, and hybrid retrieval with reranking and caching – with the ability to define standards across teams
Advanced skills in vector databases such as Pinecone, Weaviate, pgvector, OpenSearch, or Databricks Vector Search
Hands-on production experience with at least one major LLM provider – AWS Bedrock (preferred), OpenAI / Azure OpenAI, or Anthropic – including model selection, routing trade-offs, and multi-provider strategy
Strong competency in Kubernetes and Docker in real production environments (EKS/AKS/GKE), including platform-level decisions
Deep expertise in cloud engineering on AWS, including cost, security, and scalability trade-offs
Solid command of observability and tracing tools (MLflow, LangSmith, OpenTelemetry), evaluation harnesses, and latency/cost ownership at platform scale
Experience designing and owning CI/CD for AI systems (GitHub Actions, Jenkins, or equivalent) with test/eval gates
Demonstrated experience mentoring engineers, leading design reviews, and driving technical decisions across teams
Strong written and spoken English (B2+ level); able to lead design discussions, present to senior stakeholders, and influence technical direction with clients and executives

Nice to have

Databricks depth – MLflow (tracking & serving), Vector Search, Unity Catalog / Metric Views, PySpark / SQL
Experience with LLM fine-tuning – PEFT, LoRA, QLoRA – and the ability to guide build-vs-fine-tune-vs-prompt decisions
Strong understanding of MCP servers and tool integration patterns
Expertise in GenAI governance & FinOps – auditability, prompt-injection hardening, PII, and token cost in regulated environments
Background in classical ML / DL – NLP, BERT-family, time-series, and CV