SRE & DevOps Engineer

  • Full-time
  • Work Model: Hybrid

Company Description

Metro Global Solution Center (MGSC) is internal solution partner for METRO, a €31 Billion international wholesaler with operations in more than 30 countries. The store network comprises a total of 623 stores in 21 countries, of which 522 offer out-of-store delivery (OOS), and 94 dedicated depots. In 12 countries, METRO runs only the delivery business by its delivery companies (Food Service Distribution, FSD).

HoReCa and Traders are core customer groups of METRO. The HoReCa section includes hotels, restaurants, catering companies as well as bars, cafés and canteen operators. The Traders section includes small grocery stores and kiosks. The majority of all customer groups are small and medium-sized enterprises as well as sole traders. METRO helps them manage their business challenges more effectively.

MGSC, location wise is present in Pune (India), Düsseldorf (Germany) and Szczecin (Poland). We provide HR, Finance, IT & Business operations support to 31 countries, speak 24+ languages and process over 18,000 transactions a day. We are setting tomorrow’s standards for customer focus, digital solutions, and sustainable business models. For over 10 years, we have been providing services and solutions from our two locations in Pune and Szczecin. This has allowed us to gain extensive experience in how we can best serve our internal customers with high quality and passion. We believe that we can add value, drive efficiency, and satisfy our customers.

Job Description

We are looking for…

  • An experienced SRE & DevOps Engineer with deep expertise in cloud infrastructure, automation, and observability
  • A hands-on engineer who ensures reliability, performance, and scalability of systems
  • A proactive problem solver with a strong focus on operational excellence and continuous improvement
  • A collaborator who bridges development and operations through modern DevOps and SRE practices
  • An effective communicator who thrives in cross-functional teams and drives best practices

This role matters to us…

The Senior SRE & DevOps Engineer plays a vital role in ensuring the resilience, scalability, and reliability. By applying modern SRE principles, automation, and incident management practices, you will enable faster, more reliable delivery of business value while safeguarding system stability and customer trust.

Key Responsibilities

  • Design, implement, and maintain scalable, secure, and cloud-native infrastructure
  • Set up and maintain observability solutions, including monitoring, alerting, logging, and tracing (e.g., Prometheus, Grafana, ELK, DataDog)
  • Continuously improve CI/CD pipelines and automate deployment workflows to increase delivery efficiency
  • Lead structured incident response, root cause analysis, and drive a culture of post-mortem learning
  • Collaborate closely with developers, QA, and architects to ensure seamless integration and performance optimization
  • Apply SRE principles (SLIs, SLOs, SLAs, error budgets) to guide operational decisions and system reliability
  • Champion Infrastructure-as-Code practices using Terraform, Helm, or Ansible
  • Ensure security, compliance, and reliability are embedded into operations
  • Mentor team members and foster a culture of operational excellence and continuous improvement

Qualifications

Education

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience

Work Experience

  • Proven 6 to 8 yrs experience in Site Reliability Engineering, DevOps, or Cloud Engineering roles
  • Hands-on expertise with Kubernetes (preferably GKE), Docker, and service mesh technologies like Istio
  • Strong background in CI/CD practices and tools (GitHub Actions, Jenkins X, ArgoCD, or similar)
  • Experience with observability solutions (Prometheus, Grafana, ELK, Jaeger, DataDog, GCP Dashboards)
  • Proficiency with at least one major cloud platform (GCP, AWS, Azure)
  • Scripting or programming experience (Python, Go, Bash, or similar)
  • Practical knowledge of Infrastructure-as-Code tools like Terraform, Helm, or Ansible
  • Hands-on experience managing incidents, troubleshooting, and performing root cause analysis
  • Familiarity with SRE practices (SLIs, SLOs, SLAs, error budgets)

Other Requirements

  • Strong communication and collaboration skills across cross-functional teams
  • Ability to balance short-term operational needs with long-term scalability and system health
  • Analytical and proactive mindset with focus on continuous improvement
  • Fluency in English (written and spoken)

Nice-to-Have

  • Experience with security best practices in distributed systems (OAuth2, mTLS, RBAC)
  • Knowledge of cost optimization and cloud governance practices
  • Familiarity with Camunda/CIB7 environments
  • Contributions to open-source DevOps/SRE communities
Privacy PolicyImprint