Lead Site Reliability Engineer

  • Full-time

Company Description

At KMS Technology, we are dedicated to delivering cutting-edge solutions and services that empower businesses to achieve their goals. Our team is composed of highly skilled professionals who are passionate about technology and innovation. We provide a dynamic and collaborative work environment where you can grow your career and make a significant impact.

 

Job Description

We are seeking a Lead Site Reliability Engineer to spearhead the reliability, scalability, and performance of our AI-powered property intelligence platform. Operating at the intersection of Geospatial AI and Insurance Technology, you will be responsible for a mission-critical Azure ecosystem supporting high-throughput Java microservices.

As a Lead, you will bridge the gap between complex AI model inference and enterprise-grade stability. You will own the "Production Excellence" mandate, mentoring a team of engineers and collaborating with Senior Delivery Directors to ensure our global infrastructure stays ahead of our rapid growth.

Key Responsibilities

Strategic Infrastructure & Azure Leadership

  • Cloud Architecture: Lead the design of highly available, multi-region architectures on Azure, utilizing AKS (Azure Kubernetes Service), Azure Functions, and Service Bus.

  • IaC Governance: Establish and enforce standards for Infrastructure as Code using Terraform or Bicep, ensuring 100% automated provisioning across all environments.

  • Java Performance Engineering: Partner with Backend squads to optimize JVM performance, garbage collection tuning, and memory management for high-concurrency insurance processing.

Reliability & AI Operations (AIOps)

  • Error Budgeting: Define, negotiate, and manage SLIs, SLOs, and SLAs with Product Stakeholders, balancing the velocity of AI feature releases with system stability.

  • Advanced Observability: Architect end-to-end monitoring and distributed tracing using Azure Monitor, Application Insights, and ELK/Grafana.

  • Incident Commander: Act as the ultimate escalation point for high-priority incidents, leading complex Root Cause Analysis (RCA) and driving long-term remediation tasks.

Security & Industry Compliance

  • Data Sovereignty: Ensure the platform adheres to insurance-specific data residency requirements and security frameworks (SOC2, HIPAA, or ISO 27001).

  • Automated Governance: Implement Azure Policy and automated security scanning within CI/CD pipelines to ensure a "Secure by Design" infrastructure.

 

Qualifications

Technical Leadership:

  • 7+ years in SRE, DevOps, or Cloud Engineering, with at least 2 years in a Lead or Principal capacity.

  • Azure Mastery: Expert-level knowledge of the Azure Well-Architected Framework, specifically around networking (VNet/ExpressRoute) and Compute.

  • Java Ecosystem: Deep proficiency in the Java/Spring Boot stack from an operational perspective (JVM profiling, thread dump analysis).

  • Container Orchestration: Mastery of Kubernetes (AKS), including ingress controllers, service mesh (Istio), and cluster security.

Professional Competencies:

  • Strategic Mindset: Ability to translate technical debt and reliability risks into a data-driven business case for leadership.

  • Automation Advocate: Proven track record of eliminating "Toil" through Python, Go, or Java-based automation tooling.

  • Mentorship: Passion for leveling up the engineering organization through workshops, documentation, and pair programming.

  • AI-First Integration: Experience leveraging AI for predictive scaling and automated log summarization to reduce Mean Time to Recovery (MTTR).

Additional Information

Perks you enjoy at KMS Mexico

  • Mexican law benefits
  • 15 days of PTO (in year zero, from the first year onwards it is 3 days per year).
  • 5 days' leave for the death of immediate family members, negotiable.
  • Major Medical Expenses Insurance with coverage for immediate dependents (spouse and children).
  • Annual performance bonus (≈10% of annualized salary).
  • Annual salary adjustment.
  • Employee Referral Bonus.
  • Paid Certifications / Courses
  • Coursera License.
  • 5% Savings Fund.
  • 5% Grocery Vouchers.
Privacy NoticeImprint