We are seeking a hands-on Senior AI Engineer who designs, builds, and operates production GenAI systems – agentic workflows, RAG pipelines, and LLM-backed services with real users and real SLAs. This is an engineering role, not a research role. The bar is reliability, latency, cost, observability, and safe deployment at scale, with end-to-end ownership from architecture through on-call. Typical workloads include enterprise knowledge platforms, conversational analytics, agentic automation, and LLM-augmented data products.
Responsibilities
- Design agent orchestration (graph/state, conditional routing, tool calling, memory, checkpointing) in LangGraph / LangChain or equivalent
- Build production RAG end-to-end: chunking, embeddings, vector stores, hybrid retrieval, reranking, caching, and grounded synthesis
- Own Python / FastAPI services – async, SSE streaming, session handling, and structured error contracts
- Instrument with tracing and evaluation harnesses (MLflow, OpenTelemetry, or equivalent) for accuracy, cost, and regression
- Ship on Docker + Kubernetes (EKS/AKS/GKE) via CI/CD with test, eval, and canary gates
- Drive LLM cost engineering – model routing, prompt optimization, caching, token accounting, and build-vs-buy decisions
- Apply GenAI safety & governance: hallucination control, prompt-injection defense, PII handling, and HITL where required
- Partner with data engineering on semantic layers and pipelines (PySpark / SQL where applicable)
Requirements
- 5+ years in software engineering, with 2+ years shipping production LLM / agentic systems (not POCs or research)
- Proficiency in Python and FastAPI (async, REST, SSE)
- Production expertise in LangChain and LangGraph (or equivalent serious production experience with LlamaIndex, AutoGen, or MCP stacks)
- Background in production RAG: embeddings, chunking, and hybrid retrieval with reranking and caching
- Skills in vector databases such as Pinecone, Weaviate, pgvector, OpenSearch, or Databricks Vector Search
- Knowledge of at least one major LLM provider in production – AWS Bedrock (preferred), OpenAI / Azure OpenAI, or Anthropic – with model selection and routing trade-offs
- Competency in Kubernetes and Docker in real production environments (EKS/AKS/GKE)
- Expertise in cloud engineering on AWS
- Familiarity with observability and tracing tools (MLflow, LangSmith, OpenTelemetry), evaluation harnesses, and latency/cost ownership
- Capability to build CI/CD for AI systems (GitHub Actions, Jenkins, or equivalent) with test/eval gates
- Strong written and spoken English (B2 level); able to own design discussions with engineering and business stakeholders independently
Nice to have
- Databricks depth – MLflow (tracking & serving), Vector Search, Unity Catalog / Metric Views, PySpark / SQL
- Experience with LLM fine-tuning – PEFT, LoRA, QLoRA
- Understanding of MCP servers and tool integration
- Qualifications in GenAI governance & FinOps – auditability, prompt-injection hardening, PII, and token cost in regulated environments
- Background in classical ML / DL – NLP, BERT-family, time-series, and CV