Senior Data Engineer (PySpark, NoSQL)

EPAM·Poland·Удалённо·1 мес. назад

We are seeking a Senior Data Engineer with strong expertise in Azure and PySpark, skilled in designing, implementing, and maintaining robust data processing solutions. This role focuses on building scalable, production-grade data systems, ensuring reliability, and optimizing performance in distributed environments.

Responsibilities

Design and optimize large-scale data pipelines using PySpark
Build and maintain scalable ETL/ELT workflows in Azure
Troubleshoot production issues related to performance, latency, and availability
Work with distributed NoSQL technologies (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB, or similar)
Optimize Spark jobs (partitioning, execution plans, resource usage)
Implement best practices for scalability, security, and reliability
Collaborate with cross-functional teams on data-driven solutions
Contribute to automation, CI/CD, and operational improvements

Requirements

5+ years of experience as a Data Engineer or similar role
Strong hands-on experience with PySpark in production
Proven experience in data modeling, partitioning, indexing, and performance tuning in NoSQL systems
Strong programming skills in Python
Experience building and operating production-grade pipelines in cloud (Azure)
Experience with distributed NoSQL databases (e.g., Cosmos DB, Cassandra, DynamoDB, MongoDB)
Strong understanding of distributed systems and performance optimization
Experience with CI/CD, monitoring, troubleshooting, and production support
Strong analytical and communication skills (English B2+)

Nice to have

Experience with real-time / streaming data
Exposure to Data Science workflows
Knowledge of Big Data ecosystems
Experience with financial data
Familiarity with AI-assisted development or LLM tools

Senior Data Engineer (PySpark, NoSQL)

Responsibilities

Requirements

Nice to have

Похожие вакансии