We are looking for a seasoned Lead AI Engineer who architects, builds, and operates production GenAI platforms – agentic workflows, RAG pipelines, and LLM-backed services with real users and real SLAs – while leading engineers and setting the technical direction across multiple workstreams.
This is an engineering leadership role, not a research role. The bar is reliability, latency, cost, observability, and safe deployment at scale, with end-to-end ownership from architecture through on-call, and accountability for the technical quality and delivery of the team. Typical workloads include enterprise knowledge platforms, conversational analytics, agentic automation, and LLM-augmented data products.
Responsibilities
- Own the end-to-end architecture of GenAI platforms across multiple services and teams, defining standards, patterns, and reference implementations
- Lead the design of agent orchestration (graph/state, conditional routing, tool calling, memory, checkpointing) in LangGraph / LangChain or equivalent, and set best practices for the team
- Architect production RAG end-to-end: chunking, embeddings, vector stores, hybrid retrieval, reranking, caching, and grounded synthesis – and mentor engineers in building it
- Drive the design and delivery of Python / FastAPI services – async, SSE streaming, session handling, and structured error contracts – establishing service templates and conventions
- Define the observability and evaluation strategy (MLflow, OpenTelemetry, or equivalent) for accuracy, cost, and regression across the platform
- Own the deployment platform on Docker + Kubernetes (EKS/AKS/GKE) with CI/CD, test, eval, and canary gates – setting release standards for AI systems
- Lead LLM cost engineering strategy – model routing, prompt optimization, caching, token accounting, and build-vs-buy decisions at portfolio level
- Establish GenAI safety & governance practices: hallucination control, prompt-injection defense, PII handling, and HITL where required
- Partner with data engineering leadership on semantic layers and pipelines (PySpark / SQL where applicable), and align roadmaps across teams
- Mentor and grow senior and mid-level engineers through design reviews, pairing, and technical coaching; conduct hiring and technical interviews
- Represent engineering in conversations with clients, product, and executive stakeholders; translate business goals into technical strategy and delivery plans
Requirements
- 6+ years in software engineering, with 3+ years shipping production LLM / agentic systems (not POCs or research)
- 1+ years of experience leading engineers or technical workstreams
- Proven track record of owning architecture for multi-service GenAI or distributed systems in production
- Expert-level proficiency in Python and FastAPI (async, REST, SSE)
- Deep production expertise in LangChain and LangGraph (or equivalent serious production experience with LlamaIndex, AutoGen, or MCP stacks)
- Strong background in production RAG: embeddings, chunking, and hybrid retrieval with reranking and caching – with the ability to define standards across teams
- Advanced skills in vector databases such as Pinecone, Weaviate, pgvector, OpenSearch, or Databricks Vector Search
- Hands-on production experience with at least one major LLM provider – AWS Bedrock (preferred), OpenAI / Azure OpenAI, or Anthropic – including model selection, routing trade-offs, and multi-provider strategy
- Strong competency in Kubernetes and Docker in real production environments (EKS/AKS/GKE), including platform-level decisions
- Deep expertise in cloud engineering on AWS, including cost, security, and scalability trade-offs
- Solid command of observability and tracing tools (MLflow, LangSmith, OpenTelemetry), evaluation harnesses, and latency/cost ownership at platform scale
- Experience designing and owning CI/CD for AI systems (GitHub Actions, Jenkins, or equivalent) with test/eval gates
- Demonstrated experience mentoring engineers, leading design reviews, and driving technical decisions across teams
- Strong written and spoken English (B2+ level); able to lead design discussions, present to senior stakeholders, and influence technical direction with clients and executives
Nice to have
- Databricks depth – MLflow (tracking & serving), Vector Search, Unity Catalog / Metric Views, PySpark / SQL
- Experience with LLM fine-tuning – PEFT, LoRA, QLoRA – and the ability to guide build-vs-fine-tune-vs-prompt decisions
- Strong understanding of MCP servers and tool integration patterns
- Expertise in GenAI governance & FinOps – auditability, prompt-injection hardening, PII, and token cost in regulated environments
- Background in classical ML / DL – NLP, BERT-family, time-series, and CV