We are seeking a Senior C/C++ Engineer to join our high-performance team focused on accelerator kernel development for machine learning and high-performance computing workloads.
This role involves working close to the hardware, developing and optimizing low-level kernels that drive next-generation AI acceleration platforms and ensure maximum performance, efficiency and scalability for tensor-based computations.
Responsibilities
- Design, develop and maintain kernel-level software components for accelerator-based ML and HPC applications
- Optimize low-level kernels and kernel libraries with emphasis on tensor computation, tensor data movement and memory efficiency
- Implement and refine tensor compute and data movement kernels for enhanced execution performance
- Perform deep low-level optimizations to maximize software and hardware utilization
- Analyze performance bottlenecks using profiling tools and propose effective optimization strategies
- Collaborate with machine learning engineers and data scientists to integrate optimized kernels into ML frameworks and pipelines
- Ensure code quality through unit testing, debugging and performance validation
- Maintain stability, reliability and scalability of kernel-level code in production environments
Requirements
- 5+ years of working experience in C and C++
- Proven experience in kernel development or low-level systems programming
- Deep expertise in low-level performance optimization and understanding of tensor operations
- Skills in analyzing and debugging complex performance-critical code
- Hands-on experience with performance profiling and optimization tools
- Excellent problem-solving and analytical skills
- Capability to work effectively in a highly technical, performance-driven environment
Nice to have
- Familiarity with machine learning frameworks and ML concepts
- Knowledge of operating system internals
- Experience with GPU programming such as CUDA or OpenCL
- Background in accelerator architectures or custom compute hardware