AI Platform & Enablement Engineer

EPAM·Malaysia·Офис·вчера

You will build the foundations every other AI engineer at the company depends on: the model gateway, the evaluation harness, the observability stack, the safety guardrails and the internal accelerators that turn what would be a six-week setup into a one-day setup.

This is the role most firms forget to invest in — and quietly pay for, in slow engagements, inconsistent quality, and engineers reinventing the same plumbing on every project. We are not making that mistake.

Responsibilities

Design, build, and operate the shared platform that our AI engineering teams use across every customer engagement. This includes: a model gateway (multi-provider routing, fallback, cost and rate-limit controls), an evaluation harness, prompt and dataset versioning, observability and tracing for AI workloads, safety and policy guardrails, and a small set of internal accelerators (templates, libraries, scaffolding)
Make the platform a force multiplier. Your work is successful when an engineer joining a new engagement can be in production with a credible AI feature in days, not weeks
Own the platform as a product. Set roadmaps, gather feedback from internal users, write good docs, run office hours, deprecate things that have stopped pulling their weight
Set the standards for how we evaluate, monitor, and operate AI systems — and make those standards easy to follow because the platform makes the right thing the default
Partner closely with the Learning & Evaluations function and with security, legal, and data-governance teams. Make the boring-but-essential things — audit trails, data lineage, tenancy, secret handling — work the same way every time
Contribute to our hiring bar for platform engineers across the practice

Requirements

Substantial experience building and operating platforms that other engineers depend on — internal developer platforms, ML platforms, data platforms, or similar
Hands-on familiarity with the AI infrastructure problem space: model providers, vector databases, embedding models, agent frameworks, evaluation tools, prompt management, observability for LLM workloads
Strong systems engineering: distributed systems, latency/cost tradeoffs, reliable service design, infrastructure-as-code, CI/CD
A product mindset. You measure success by adoption and by the time-to-first-value of internal users — not by how clever the architecture is
Excellent written communication. You will write the docs, the design proposals, and the post-mortems that the rest of the practice depends on
Pragmatism. You build the simplest thing that will work, and you make it possible to replace later

Nice to have

Experience with one or more of: Kubernetes, Terraform, Pulumi, Temporal, observability stacks (OpenTelemetry, Datadog, Honeycomb), policy engines (OPA), secrets management
Prior experience as the founding platform engineer on a fast-growing AI team
Background in security, compliance, or data governance for AI systems
Open-source contributions, especially to AI infrastructure projects