We are seeking a Lead Data Software Engineer to come on board with our team.
This position centers on developing and maintaining data infrastructure that fuels AI-powered products and intelligent agent systems. You'll get the chance to engage with state-of-the-art technologies and help shape scalable, dependable platforms within a cooperative setting.
Responsibilities
- Plan, build, and support data ingestion and processing pipelines that supply RAG systems, covering the management of unstructured data, images, videos, metadata, and permissions
- Oversee and fine-tune vector database infrastructure, such as Amazon Kendra alongside an active migration toward OpenSearch
- Build evaluation datasets and performance measurement frameworks tailored to agents
- Establish monitoring and observability pipelines for AI workloads, including dashboards for latency, quality, and cost
- Roll out data governance, privacy guardrails, and quality controls for AI training and inference data
- Back A/B testing and experimentation infrastructure used to evaluate agent iterations
- Work jointly with Backend AI engineers on data schemas and embedding approaches
Requirements
- At least 5 years of data engineering background, including direct work with AI/ML data infrastructure
- A minimum of one year guiding and managing development teams
- Solid Python expertise for crafting data pipelines, ETL workflows, and backend automation scripts
- Practical production experience with vector databases, covering schema design and index management for Amazon Kendra or OpenSearch
- Thorough grasp of search and retrieval concepts, including embedding models, chunking techniques, and retrieval optimization
- Working familiarity with AWS services like S3, Glue, Athena, and Kinesis (or equivalents), as well as Docker and distributed data environments
- Experience treating data quality practices such as monitoring, validation, and lineage tracking as operational standards
- Background in defining AI/ML evaluation metrics and setting up systematic tracking using evaluation frameworks
- English language proficiency in writing and speaking at B2+ level or higher
Nice to have
- Exposure to LangSmith, RAGAS, or custom-built evaluation framework approaches
- Experience with multi-modal data processing involving unstructured text, images, and videos, together with related governance
- Hands-on participation in LLM fine-tuning data preparation
- Familiarity with observability tools tightly integrated with AI calls, such as Langfuse or Arize
- Background in constructing streaming data pipelines with technologies like Kafka or Kinesis