We are looking for a Senior Data Software Engineer to join our team.
This role focuses on building and supporting data infrastructure that powers AI-driven products and intelligent agent systems. You'll have the opportunity to work with cutting-edge technologies and contribute to scalable, reliable platforms in a collaborative environment.
Responsibilities
- Design, build, and maintain data ingestion and processing pipelines that feed RAG systems, including handling unstructured data, images, videos, metadata, and permissions
- Administer and optimize vector database infrastructure, including Amazon Kendra with an ongoing migration to OpenSearch
- Create evaluation datasets and performance measurement frameworks for agents
- Develop monitoring and observability pipelines for AI workloads, covering latency, quality, and cost dashboards
- Implement data governance, privacy safeguards, and quality controls for AI training and inference data
- Support A/B testing and experimentation infrastructure for assessing agent iterations
- Collaborate with Backend AI engineers on data schemas and embedding strategies
Requirements
- A minimum of 3 years of data engineering experience, including direct exposure to AI/ML data infrastructure
- Strong Python skills for building data pipelines, ETL processes, and backend automation scripting
- Hands-on production experience with vector databases, including schema design and index management for Amazon Kendra or OpenSearch
- Deep understanding of search and retrieval concepts, including embedding models, chunking strategies, and retrieval optimization
- Practical knowledge of AWS services such as S3, Glue, Athena, and Kinesis (or equivalents), along with Docker and distributed data environments
- Experience embedding data quality practices such as monitoring, validation, and lineage tracking as operational defaults
- Background in designing AI/ML evaluation metrics and establishing systematic tracking through evaluation frameworks
- English language proficiency (written and spoken) at B2+ level or higher
Nice to have
- Experience with LangSmith, RAGAS, or custom evaluation framework solutions
- Background in multi-modal data processing covering unstructured text, images, and videos, along with associated governance
- Hands-on involvement with LLM fine-tuning data preparation
- Familiarity with observability tooling deeply integrated with AI calls, such as Langfuse or Arize
- Experience building streaming data pipelines using technologies such as Kafka or Kinesis