DevOps Engineer (AI Inference)
- Full-time
Company Description
This position is available only under an employment (labor) agreement.
The world’s digital experiences run on something invisible: the infrastructure and software that keep them fast, reliable, and secure. At Gcore, you’ll help design and deliver that foundation for an AI-driven world.
We’re a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering everything from real-time communication and streaming to enterprise AI and secure web applications. With 210+ edge locations, 50+ cloud regions, and thousands of GPUs, your work here can reach users and businesses across the globe.
You’ll collaborate with leading technology partners such as Intel, NVIDIA, Dell, and Equinix, and work on platforms that power digital products used around the world. Our vision is simple: to connect the world to AI, anywhere, anytime.
Want to work on technology that goes beyond a single product or industry? Join a global team of 550+ professionals building infrastructure and software that supports the entire digital ecosystem.
We are looking for a talented DevOps Engineer to join our AI Inference Operations Team.
Job Description
As a DevOps Engineer, you will be responsible for designing, deploying, and maintaining infrastructure and services that enable scalable and secure AI inference workloads on-premises.
What You Will Do
- Design, develop, and maintain infrastructure for AI inference workloads, including GPU scheduling, model deployment pipelines, and data access patterns in on-prem environments
- Build and manage monitoring and observability tools for AI inference platforms, including dashboards, alerts, and runbooks for model health and system performance
- Collaborate with ML engineers and platform teams to design system architecture for AI workloads, integrate inference runtimes, and test performance at scale
Qualifications
What We're Looking For
- Strong understanding of Kubernetes architecture, including CNI, CSI, operators, ingress/gateway, and control plane components.
- Hands-on experience operating and troubleshooting production Kubernetes clusters.
- Strong Linux and networking troubleshooting skills, including DNS, routing, firewalling, TLS, MTU, connectivity and performance issues.
- Ability to develop automation and operational tooling using Python, Go, or Bash.
- Experience with Terraform, Ansible, or similar IaC/configuration management tools.
- Experience with VictoriaMetrics/Grafana or similar monitoring, alerting, and troubleshooting tools.
- Strong experience with Git-based workflows and CI/CD pipelines.
Preferred Qualifications
- Familiarity with Cluster API or similar Kubernetes cluster lifecycle management technologies.
- Hands-on operation or administration of Slurm clusters.
- Knowledge of Argo CD, GitOps workflows, Helm, or Helmfile.
- Background working with managed platforms, PaaS, or cloud services.
- Exposure to bare metal, GPU, HPC, or other high-performance computing environments.
Nice to Have
- Familiarity with the NVIDIA GPU stack, RDMA/InfiniBand, or high-performance networking.
- Knowledge of OpenStack or similar cloud infrastructure platforms.
- Hands-on experience developing Kubernetes operators or controllers.
Additional Information
Benefits
At Gcore, we want you to do your best work and enjoy the journey. Our benefits are designed to support your growth, well-being, and life beyond work:
- Competitive compensation
- Flexible working hours and hybrid or remote options, depending on your role
- Work from anywhere in the world for up to 45 days per year
- Private medical insurance for you and your family*
- Extra paid vacation and sick leave days*
- Support for life’s important moments and celebrations
- Language courses to help you connect and grow
- Modern, welcoming offices with snacks, drinks, and entertainment*
- Team sports and social activities*
*Benefits may vary depending on your location.
Equal Opportunity Employer
We provide equal opportunity to all applicants without regard to race, color, religion, sex, sexual orientation, age, gender identity, gender expression, national origin, disability, or any other legally protected characteristics.
By clicking the link above or any third-party link within this posting, you are leaving this site and going to a third-party website where the third-party website's terms and privacy policy apply