Our customer is one of the world’s largest technology companies based in Silicon Valley with operations all over the world. In this project, we are working on the bleeding edge of Big Data technology to develop a high-performance data analytics platform, which handles petabytes datasets.
We are looking for an experienced Big Data Engineer.
Essential functions
- Develop, implement, and maintain scalable data ingestion and transformation jobs using Scala/Python.
- Implement robust Spark-based ETL/ELT pipelines to migrate data efficiently from legacy systems (HDFS/Hive) to modern cloud storage solutions.
- Apply rigorous data quality checks and validation processes throughout the migration lifecycle.
- Participate actively in code reviews, ensuring adherence to the team's best practices and writing clean, testable, and maintainable code.
- Document technical designs, pipeline logic, and standard operational procedures.
- Support troubleshooting, debugging, and bug fixing during critical migration and deployment activities.
- Contribute to AI engineering or Prompt engineering efforts related to data platform usage.
Qualifications
- Distributed Processing: Expert-level proficiency with Apache Spark (batch and streaming).
- Data Lake Formats: Hands-on experience with Apache Iceberg (or similar formats like Delta Lake or Apache Hudi) for implementing ACID transactions, schema evolution, and time travel capabilities.
- Big Data Ecosystem: Deep knowledge of HDFS internals and large-scale migration strategies.
- Programming: Strong engineering skills in Scala and/or Python.
- Orchestration: Experience running Spark and/or Flink jobs on Kubernetes (e.g., using Spark-on-K8s operator).
- Cloud/Storage: Experience with distributed blob storages (e.g., AWS S3, Ceph, etc.).
- Data Pipelining: Proven ability to build ingestion, transformation, and enrichment pipelines for complex, large-scale datasets.
- Infrastructure-as-Code (IaC): Familiarity with tools like Terraform or Helm for provisioning and managing data infrastructure.
Would be a plus
- Experience with Apache Flink for high-velocity streaming data processing.
- Prior hands-on experience in major migration projects or large-scale data platform modernization initiatives.
We offer
- Opportunity to work on bleeding-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Professional development opportunities
About us
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI,
and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical
challenges and enable positive business outcomes for enterprise companies undergoing business transformation.
A key differentiator for Grid Dynamics is our 8 years of experience and leadership in
enterprise AI, supported by profound expertise and ongoing investment in
data,
analytics,
cloud & DevOps,
application modernization
and
customer experience.
Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.