We are looking for a Senior AI Engineer with Databricks expertise to design, deploy and maintain scalable machine learning pipelines using the Databricks platform. In this role, you will deliver production-ready ML pipelines, automated training and retraining workflows, deployed models, monitoring dashboards and CI/CD pipelines for ML systems.
Responsibilities
- Design, implement and maintain end-to-end ML pipelines on Databricks
- Build workflows for data ingestion, preprocessing, feature engineering, training and inference
- Leverage PySpark, Spark ML and Databricks notebooks/jobs
- Manage model versioning, experiment tracking and reproducibility using MLflow
- Package and deploy models for batch and real-time inference
- Monitor model performance, drift and retraining cycles
- Develop scalable ETL/ELT pipelines using Databricks Delta Lake
- Optimize data storage and access patterns through partitioning, Z-ordering and caching
- Integrate with data sources such as Azure Data Lake, S3, APIs and databases
- Implement CI/CD pipelines for ML workflows using Azure DevOps, GitHub Actions and Databricks Repos and Jobs API
- Configure clusters, autoscaling and cost optimization while applying Infrastructure as Code with Terraform, ARM and Bicep
- Implement logging, alerting and observability to ensure high availability and fault tolerance of ML systems
Requirements
- 3+ years of experience in machine learning engineering or related roles
- Expertise in the Databricks platform including workspaces, jobs and clusters
- Proficiency in Apache Spark, PySpark and Python with pandas and scikit-learn
- Skills in MLflow for tracking, registry and deployment
- Competency in CI/CD pipelines, Docker containerization and REST APIs for model serving
- Familiarity with version control using Git
- Background in Azure including Azure Databricks, ADLS, ACR and AML
- Knowledge of data preprocessing, feature engineering and model training and evaluation
- Understanding of libraries such as XGBoost, LightGBM and CatBoost
- English proficiency at B2 level or higher
Nice to have
- Familiarity with AWS including S3, EMR and SageMaker
- Skills in streaming pipelines with Spark Structured Streaming and Databricks Feature Store
- Knowledge of Kubernetes
- Competency in monitoring tools such as Prometheus and Grafana
- Experience with large-scale production systems