Senior AI Infrastructure Developer

Grid Dynamics·Moldova·Офис·сегодня

Our client is the world’s largest broadline food distributor, specializing in food and non-food products for restaurants, healthcare, educational facilities, lodging, and more. The company serves more than 600,000 clients in 90+ countries and operates approximately 330 distribution facilities worldwide.
As part of the AI Native Dev Adoption POD, you will merge the skills of a Senior Infrastructure Architect with an AI Ops Specialist. Your mission is to build, configure, and scale smart, self-healing systems that automate triage, rightsizing, and system remediation.

Essential functions

Build Intelligent Agents: Design and implement specialized AI Ops agents, including Root Cause Analysis (RCA) triage agents, Autonomous Remediation agents, FinOps rightsizing systems, and APIC onboarding loops.
Orchestrate Cluster Automation: Configure alert ingestion services, cross-domain signal ingestion, dynamic runbook configuration, and signal-plan-prove loops per cluster.
Lead Application Deployment: Drive per-cluster adoption by onboarding legacy enterprise applications into the autonomous operations loop (transitioning apps from standard telemetry to human-in-the-loop (HITL) and ultimately to fully bounded autonomous remediation).
Collaborate and Document: Co-build runbook curation strategies directly with the client’s operations engineering teams. Establish, validate, and document clear boundaries for safe system autonomy.
Maturity Evolution: Work seamlessly inside a high-performing dedicated POD alongside a Platform Architect, Product Owner, AI Harness Engineer, and RAG Data Engineer to elevate the organization's overall AI SDLC practices.

Qualifications

Senior-Level SRE/Architecture Expertise: Proven engineering background at a Senior or Lead level with strong architectural design capabilities in complex enterprise environments.
Advanced Observability Stack: Deep, hands-on experience with modern cloud observability providers and monitoring ecosystems (Datadog, Splunk, Grafana, Loki, or Prometheus).
ITSM Integration: Solid familiarity with enterprise incident management frameworks and automated communication systems (PagerDuty, ServiceNow, Slack, Teams).
Automation & Signal Exposure: Exposure to or experience working with alert ingestion engines, automated response tooling, and pattern matching inside large log data structures.
Language Proficiency: Strong professional English communication skills (both spoken and written) to work closely with global distributed teams.

We offer

Opportunity to work on bleeding-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule & a hybrid working model
Benefits package - medical insurance, sports
Corporate social events
Professional development opportunities
Well-equipped office

About us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

Другие вакансии Grid Dynamics

Grid Dynamics

3д. назад

Slack Migration Specialist

Quality Engineer Performance

Офис

Poland

Senior AI Infrastructure Developer

Похожие вакансии

Другие вакансии Grid Dynamics