Project description
We are looking for a Performance Optimization Engineer with strong C/C++ skills and experience in multicore systems, who can analyze bottlenecks and optimize applications for HPC, data center, and DSP workloads.
Responsibilities
- Optimization/development of the CPU performance stack (applications, libraries) for AMD, ARM processors.
- Leveraging AI tools to research and implement future-proof solutions
- Continuously learn and grow along with evolving X86, ARM CPU architecture and application landscape.
- Problem solving across multiple software layers, (user space, kernel, applications, libraries) and hardware.
- Analyze and solve performance, scalability bottlenecks when code is running on multi-core, multi-node deployments.
- Lead collaborative approaches with multiple teams.
SKILLS
Must have
- 6-10 years of experience
- Experience in software development using C/C++ and debugging skills on multicore systems.
- Experience in performance analysis for data center, HPC (High Performance Computing), MPI (Message Passing Interface), Codec, DSP applications.
- Experience in x86/ARM (or other architecture-based) optimizations.
- Experience in identifying performance bottlenecks and designing/implementing optimizations to relieve analyzed bottlenecks.
- Knowledge of one or more CPU debugging profiling tools for Linux/Windows/Mac.
- Understanding of Cache sub-system, Instruction Set Architecture, pipeline (for any CPU).
- Very strong data structure and algorithmic skills.
Nice to have
Bonus skills: Experience with Intel MKL libraries, Linear Algebra, FFT, x86/ARM assembly programming.