We are seeking a highly skilled and hands-on Senior Software Developer with expertise in C++, AI and ML.
It is a fully remote position offering you the flexibility to work from any location in Poland, whether it's your home or one of our well-equipped offices in Gdansk, Katowice, Krakow, Lodz, Warsaw, or Wroclaw.
Responsibilities
- Develop and optimize low-level workloads and kernels to enhance the performance of software for machine learning applications
- Implement tensor compute and tensor data movement optimization kernels
- Design, develop, and maintain kernel-level software components for the client’s machine learning and HPC applications
- Perform in-depth analysis and optimization of low-level code, focusing on improving tensor optimization and efficiency
- Collaborate with machine learning engineers to integrate optimized kernels and routines into frameworks and pipelines
- Identify performance bottlenecks through profiling and apply strategies to resolve inefficiencies
- Conduct comprehensive testing, unit test development, and debugging to ensure the stability and reliability of kernel-level code
- Communicate and problem-solve effectively to analyze and debug complex software issues
- Leverage industry tools for performance profiling and optimization
Requirements
- 3+ years of experience in relevant roles involving low-level programming and optimization
- Proficiency in C/C++ and low-level programming with a focus on kernel optimization
- Background in kernel development, including the implementation of efficient kernels and libraries for machine learning and HPC
- Expertise in low-level optimization techniques, specifically in tensor compute and data movement
- Experience with performance profiling, debugging tools, and strategies for software optimization
- Proven skills in kernel-level software testing, debugging, and development, ensuring efficiency and stability
- Understanding of machine learning pipelines and collaboration between software engineers and data scientists
- Strong problem-solving skills and the ability to resolve bottlenecks in performance-critical environments
- B2 level of English or higher, with an emphasis on technical communication skills
Nice to have
- Familiarity with machine learning frameworks and an understanding of related concepts
- Knowledge of operating system internals
- Understanding and experience with GPU programming such as CUDA or OpenCL