Design, build, and maintain scalable data pipelines for acquiring, integrating, and managing data from diverse data generation sources and systems (e.g., lab systems, MES, clinical supply, quality systems, external partners).
Create and optimise data flows for structured and unstructured data using Python (PySpark), R, SQL, Databricks, Snowflake, and other modern engineering tools.
Develop and maintain specific data repositories, implementing enterprise‑level data models, and creating new models as needed.
Enable AI/ML readiness by ensuring data is well‑structured, versioned, traceable, and semantically aligned with enterprise data standards.
Data Product & Architecture Partnership
Partner with data scientists, domain experts, and digital technology teams to translate business needs into high‑quality data products and engineering requirements.
Work closely with ontology/knowledge graph teams to implement semantic models and future‑proof data architectures.
Quality, Compliance & Performance
Implement data quality and performance standards; define KPIs to measure accuracy, completeness, and consistency across the data assets.
Apply data versioning and lineage tracking for compliance, traceability, and audit readiness.
Follow software development best practices including code versioning, DevOps integration, and documentation.
Cross‑Functional Collaboration
Engage with scientific, technical, and operations stakeholders to understand requirements, design data solutions, and drive adoption.
Support multiple concurrent projects, managing priorities, and delivering maximum business value across the network.
Requirements
Bachelor’s degree in Engineering, Data Science, Life Sciences, Computer Science, or related field; advanced degree preferred.
3+ years of experience in data engineering, including data modeling and database design, preferably in a scientific, manufacturing, or healthcare environment.
Proficiency with Python, R, SQL, and cloud-based architectures (AWS services, Snowflake, Databricks, Redshift).
Expertise in ETL and DWH.
Experience with NoSQL and graph databases.
English language proficiency of B2+
Strong analytical, problem‑solving, and stakeholder‑management skills, with the ability to translate discussions into actionable requirements.
Ability to drive multiple exciting projects simultaneously with strong organizational skills and adaptability.
Nice to have
Experience with regulated or standards‑driven data environments, such as CDISC, HL7, FHIR, OMOP, DICOM, or manufacturing/quality data standards.
Familiarity with high‑dimensional data (e.g., imaging, sensor data, etc).
Experience with principles connecting to or feeding MLOps and model deployment workflows.
Knowledge of manufacturing systems (MES), laboratory information systems, or industrial data systems.
Exposure to knowledge graph or ontology‑driven architectures.
We offer
Competitive compensation
Remote or office work
Flexible working hours
Healthcare benefits: medical insurance and paid sick leave
Continuous education, mentoring, and professional development programs