Lead C++ Developer

EPAM·Poland·Удалённо·1 мес. назад

We are seeking a talented Lead C++ Developer to join the highly interdisciplinary CoreML team, where you will drive the performance and optimization of both training and serving, delivering massive impact for customers. In this role, you will have exposure to the newest Tensor Processing Unit (TPU), Graphics Processing Unit (GPU) hardware, the latest ML models, and the advanced toolchains that bridge them. Your work will directly enable AI research, production deployments and the broader open-source ecosystem, addressing complex technical issues that directly impact the efficiency and scalability of AI across the industry.

Responsibilities

Design and optimize high-performance kernels (using languages like Pallas, Mosaic and Triton) targeting Tensor Processing Unit (TPU) and Graphics Processing Unit (GPU) architectures for critical Machine Learning (ML) operations, redefining what's possible from massive training runs to high-speed inference
Architecture of infrastructure such as benchmarking suites, autotuning frameworks, performance analysis tools, regression testing and documentation, transforming how the developer community interacts with increasingly critical custom kernels in key Open-Source Software (OSS) libraries
Track the latest advancements in hardware architectures, compiler technologies and AI models to identify new opportunities for performance optimization through custom kernels
Engagement with ML researchers, framework developers (Just After eXecution (JAX), PyTorch) and compiler engineers (Accelerated Linear Algebra (XLA)) to enhance adoption, identify new requirements and address bottlenecks by providing appropriate solutions

Requirements

Bachelor's degree or equivalent practical experience
Overall 7+ years of industry experience
5 years of experience with software development in C++ or Python
3 years of experience testing, maintaining or launching software products, and 1 year of experience with software design and architecture
Expertise in performance optimization at the kernel level
English proficiency at B2 level or higher

Nice to have

Skills in optimizing TPU/GPU code, using low-level kernel languages like Pallas, Compute Unified Device Architecture (CUDA) or Triton
Knowledge of ML Frameworks (JAX/PyTorch), common operations like attention and Mixture of Experts (MoEs), including model optimization and low-precision formats
Understanding of modern accelerators (e.g., data movement, pipelining, heterogeneous compute and scale-out)
Understanding of compiler principles (optimization, code generation) and toolchains such as MLIR and OpenXLA
Showcase of building developer infrastructure, including Open-Source Software (OSS) libraries, flexible high-performance APIs and easy-to-consume documentation to empower the community