Lead AI Engineer

EPAM·Mexico·Удалённо·2 мес. назад

We are seeking a Lead AI Engineer to design, build and scale cutting-edge AI applications powered by large language models. In this role, you will partner with clients to deliver tailored LLM-driven solutions, architect agentic systems and drive the adoption of emerging AI technologies across enterprise environments.

Responsibilities

Design, implement and maintain end-to-end AI applications, including chatbots, Q&A platforms, agent workflows and other LLM-driven solutions
Collaborate directly with clients to understand their needs, identify opportunities and recommend tailored AI/LLM solutions that drive business value
Architect and optimize robust data pipelines, prompt strategies and datasets to ensure effective, accurate and scalable AI models
Evaluate, monitor and refine AI system performance, ensure outputs are accurate, secure, scalable and compliant with industry regulations and best practices
Conduct research, design experiments and perform rapid prototyping to validate technical feasibility and demonstrate the business value of AI solutions
Stay current with evolving LLM technologies, frameworks, protocols (such as MCP, A2A, ACP) and methodologies, continuously improve solution quality and client outcomes
Design and implement agentic systems with frameworks such as LangChain, LangGraph and Semantic Kernel, integrate with vector databases and advanced memory architectures
Develop and maintain APIs and system integrations for production-grade AI applications, including enterprise system integration (CRM, ERP, databases)
Deploy AI solutions at scale, consider performance, cost-efficiency, maintainability, observability and security (including guardrails and prompt injection prevention)
Implement and monitor retrieval systems (keyword search, vector search, embeddings), ranking algorithms and agent evaluation frameworks
Use MLOps/AIOps practices for agentic systems and ensure robust observability and monitoring of deployed solutions
Clearly communicate complex technical concepts and AI strategies to both technical and non-technical stakeholders, iterate on models based on user feedback

Requirements

Strong proficiency in at least one modern programming language (such as Python, Java, C#, Go, etc.); experience with web frameworks like FastAPI or similar is a plus
Deep understanding of the AI application development lifecycle, including production deployment, system integration and rapid UI prototyping (Streamlit, Gradio or similar)
Familiarity with major LLM platforms and APIs (OpenAI, Anthropic, Amazon Bedrock, Gemini) and related frameworks (LangChain, LangGraph, LlamaIndex, Strands Agents, etc.)
Knowledge of advanced AI integration patterns (e.g., RAG, agent orchestration, tool calling), retrieval systems (keyword/vector search, embeddings) and ranking algorithms
Experience to deploy AI solutions at scale, with a focus on performance, cost-efficiency, maintainability, observability and security (including guardrails and prompt injection prevention)
Proven ability to evaluate generative AI quality with retrieval/classification scores, LLM-based evaluation, agent evaluation metrics and A/B testing
Experience with vector databases (Pinecone, Weaviate, ChromaDB, FAISS) and semantic/hybrid search
Experience to design experiments, conduct A/B tests and iterate on models based on user feedback
Experience with enterprise system integration (CRM, ERP, databases) and deployment to cloud AI platforms or on-premise solutions
Experience with observability and monitoring tools/frameworks, and application of MLOps/AIOps practices for agentic systems
Familiarity with emerging protocols (MCP, A2A, ACP) and advanced memory architectures
Proven experience in AI engineering and delivery of ML-based solutions in production environments
Strong problem-solving skills, attention to detail and ability to work independently and collaboratively
Excellent communication, collaboration and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders

Technologies

Proficiency in at least one modern programming language (e.g., Python, Java, C#, Go, etc.) for AI development
Web frameworks: FastAPI, Streamlit, Gradio, Flask, Spring Boot, ASP.NET or similar
Major LLM platforms and APIs: OpenAI, Anthropic, Amazon Bedrock, Gemini
Agentic frameworks: LangChain, LangGraph, Semantic Kernel, LlamaIndex, Strands Agents
Data pipeline and integration tools
Vector databases: Qdrant, FAISS, Chroma, Pinecone, Weaviate, ChromaDB
Retrieval and ranking systems: keyword search, vector search, embeddings, ranking algorithms
Cloud AI platforms: Azure OpenAI, Amazon Bedrock, GCP Vertex AI
On-premise solutions: vLLM
Enterprise AI platforms: AWS AgentCore, Databricks AgentBricks, Google Agents Space, Azure AI Foundry
Observability and monitoring tools/frameworks
MLOps/AIOps practices for agentic systems
Security and guardrail tools for AI applications
Protocols: MCP, A2A, ACP
Advanced memory architectures