Senior Python developer

Grid Dynamics·Poland·Офис·2д. назад

We are looking for an experienced Software Engineer in Performance Infrastructure to join our team. This position is part of a high-impact project for a world-renowned global tech leader and AI innovator. You will be focused on optimizing large-scale machine learning workloads for next-generation hardware within a High-Performance Computing (HPC) and Compiler Infrastructure environment. Your core mission will be to manage the end-to-end health, precision, and reliability of performance benchmarking pipelines, working at the intersection of automated infrastructure and performance engineering.

Essential functions

Responsibilities:

Performance Analysis & Validation: Evaluate results from automated benchmarking suites to detect and analyze performance shifts and shifts in metrics.
Root-Cause Analysis: Perform deep-dive root-cause analysis on bisection results to identify specific code changes responsible for performance regressions.
Infrastructure Automation: Develop and maintain Python-based tooling for benchmark automation, hardware configuration management, and automated data recovery.
System Debugging: Troubleshoot failures within the benchmarking pipeline, including script errors, environment misconfigurations, and resource allocation issues in distributed clusters.
Data Pipelines & Dashboards: Maintain and enhance data pipelines and visualization tools to ensure high-fidelity performance metrics are consistently available for engineering teams.
Technical Documentation: Develop and maintain engineering playbooks and best practices to improve consistency in performance testing and incident investigation.

Qualifications

Min requirements:

Strong proficiency in Python for systems automation, data processing, and integration.
Hands-on experience with SQL for querying large datasets and managing performance metrics.
Deep knowledge of Linux/Unix environments, shell scripting (Bash), and command-line development.
Exceptional analytical and problem-solving skills with the ability to debug complex system-level issues.
Clear written communication skills for documenting technical investigations and collaborating across globally distributed teams.

Would be a plus

Practical experience with distributed build and test systems (e.g., Bazel / CMake).
Strong familiarity with CI/CD pipelines and automated regression testing.
Basic understanding of hardware accelerators (GPUs) or machine learning frameworks (e.g., JAX, PyTorch, TensorFlow).
Background in Performance Engineering or SRE (Site Reliability Engineering).

We offer

Opportunity to work on bleeding-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule
Benefits package - medical insurance, sports
Corporate social events
Professional development opportunities
Well-equipped office

About us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.