You will build the AI systems our customers actually use. That means writing the prompts, designing the agent loops, tuning the retrieval pipelines, building the evaluations, and getting all of it into production with the latency, cost, and reliability our customers need.
This is a craft role. We are not looking for someone who has read the papers. We are looking for someone who has shipped — and who knows the specific ways AI systems fail in production, because they have fixed those failures themselves.
Responsibilities
- Design and build production-grade AI features: retrieval-augmented generation, agentic workflows, structured extraction, classification, summarisation, conversational interfaces
- Build the evaluation systems that prove whether your work is actually working — golden datasets, regression suites, online metrics, human-in-the-loop review
- Make the unglamorous decisions that determine whether a system survives contact with real users: prompt versioning, fallback behaviour, rate-limit handling, cost controls, observability
- Integrate AI capabilities into customer systems — APIs, queues, databases, identity, audit trails. Modern AI work is 30% modelling and 70% engineering, and you are comfortable in both
- Pair closely with our Solution Architects on shaping. When the design will not work, you say so early
- Contribute to the team's shared craft: code review, internal write-ups, brown-bags on what you have learned shipping the last system
Requirements
- Several years of professional software engineering, plus meaningful production experience with modern AI systems (LLMs, RAG, agents, evaluation tooling). The specific years matter less than the depth — show us what you have shipped
- Fluency in at least one of: Python, TypeScript / Node, Go
- Comfort with the modern AI engineering stack: at least one major model provider's SDK, a vector database, an evaluation framework, an orchestration library, and the observability tools you use to know whether your system is healthy
- Real experience with evaluation. You can talk for an hour about how you knew your system was getting better — not just that it felt better
- Pragmatism about model choice. You pick the smallest, cheapest, most reliable model that does the job, and you can defend the choice
- Honesty about what AI can and cannot do. You have walked away from a feature when the technology was not ready
Nice to have
- Experience fine-tuning or post-training open-weight models
- Background in distributed systems, low-latency serving, or ML infrastructure
- Open-source contributions to AI tooling
- Domain depth in one of [financial services, healthcare, public sector, industrial, retail]