We are seeking a Senior Data Engineer with strong expertise in Azure and PySpark, skilled in designing, implementing, and maintaining robust data processing solutions. This role focuses on building scalable, production-grade data systems, ensuring reliability, and optimizing performance in distributed environments.
Responsibilities
- Design and optimize large-scale data pipelines using PySpark
- Build and maintain scalable ETL/ELT workflows in Azure
- Troubleshoot production issues related to performance, latency, and availability
- Work with distributed NoSQL technologies (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB, or similar)
- Optimize Spark jobs (partitioning, execution plans, resource usage)
- Implement best practices for scalability, security, and reliability
- Collaborate with cross-functional teams on data-driven solutions
- Contribute to automation, CI/CD, and operational improvements
Requirements
- 5+ years of experience as a Data Engineer or similar role
- Strong hands-on experience with PySpark in production
- Proven experience in data modeling, partitioning, indexing, and performance tuning in NoSQL systems
- Strong programming skills in Python
- Experience building and operating production-grade pipelines in cloud (Azure)
- Experience with distributed NoSQL databases (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB)
- Strong understanding of distributed systems and performance optimization
- Experience with CI/CD, monitoring, troubleshooting, and production support
- Strong analytical and communication skills (English B2+)
Nice to have
- Experience with real-time / streaming data
- Exposure to Data Science workflows
- Knowledge of Big Data ecosystems
- Experience with financial data
- Familiarity with AI-assisted development or LLM tools