Senior Manager, Site Reliability

  • Full-time
  • Workplace Type: Hybrid
  • Career Track & Grade: MR4/10
  • Department: Engineering

Company Description

LinkedIn is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We’re also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture that’s built on trust, care, inclusion, and fun – where everyone can succeed.

Join us to transform the way the world works.

Job Description

This role will be based in Bangalore, India.

At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team.

At LinkedIn, the Productivity Engineering Site Reliability Engineering (SRE) team plays a critical role in ensuring our enterprise business applications are reliable, scalable, secure, and highly automated.

We are seeking a Senior Manager, Site Reliability Engineering to lead a high-performing team of SREs, software engineers, enterprise engineers, and test automation engineers responsible for system health, observability, and operational excellence across both development and production environments.

In this role, you will partner closely with Development and Test Automation teams from early design through production, driving improvements in reliability, performance, and scalability across complex application ecosystems. You will also collaborate with cross-functional infrastructure teams to scale and modernize financial systems infrastructure.

You will lead strategic initiatives across application, database, and middleware platforms, including performance optimization and the transformation of systems from on-premises environments to modern multi-cloud architectures.

This is a key leadership opportunity for someone passionate about building high-performing teams, driving automation at scale, and delivering resilient, efficient platforms that power mission-critical business operations.

Responsibilities:

  • Build, lead, and scale a high-performing SRE organization, including hiring, mentoring, and organizational development

  • Act as a role model and coach with a strong bias for action, engineering craftsmanship, and operational excellence

  • Participate with senior leadership to define and drive the long-term technology vision, strategy, and roadmap aligned with business priorities

  • Establish and foster a culture of ownership, accountability, continuous improvement, and high operational standards

  • Collaborate closely with cross-functional partners across development, infrastructure, testing, and business teams to drive impactful roadmaps

  • Influence and align senior stakeholders across engineering, infrastructure, and business domains

  • Own availability, reliability, performance, and scalability of enterprise business applications and financial systems

  • Define and implement SRE best practices including SLOs, SLAs, error budgets, incident management, and operational frameworks

  • Lead end-to-end incident response, root cause analysis, and long-term remediation strategies to improve system resilience

  • Drive operational maturity through metrics, observability, automation, and continuous improvement initiatives

  • Oversee application, database, and middleware platform performance, reliability, and capacity planning

  • Lead modernization efforts including migration from legacy environments to modern infrastructure.

  • Evaluate and implement new technologies and architectural patterns to improve scalability, resilience, and efficiency

  • Define and drive observability strategy across monitoring, logging, tracing, and alerting systems

  • Champion an automation-first mindset to eliminate manual processes and improve operational efficiency

  • Drive development of internal tools and self-service platforms to enhance engineering productivity and reduce operational overhead

  • Improve deployment, release, and operational workflows through engineering-led automation and standardization

  • Own infrastructure cost management, capacity planning, and financial forecasting for Financial Systems

  • Optimize infrastructure and licensing investments (e.g., Oracle ecosystem) aligned with business and financial goals

Qualifications

Basic Qualifications:

  • BA/BS degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience

  • 12+ years of experience in Site Reliability Engineering, Production Engineering, or related disciplines

  • 4+ years of experience leading and scaling high-performing engineering teams

  • Experience in Oracle ERP (EBS/Fusion) and Oracle Database technologies

  • Hands-on experience with Oracle 19c database administration, including high availability and disaster recovery (Oracle RAC, Grid Infrastructure, Data Guard)

  • Experience operating in SOX-compliant environments with focus on controls, audit, and governance

 

Preferred Qualifications:

  • 4+ years of hands-on experience troubleshooting complex issues across Unix/Linux, networking, and Windows environments

  • Proven experience working within an SRE organization, managing and operating large-scale production systems and applications

  • Strong understanding and practical application of SRE principles, including designing systems aligned with SLA/SLO/SLI objectives and building resilient, highly available platforms

  • 2+ years of programming experience in Python, Shell scripting, or similar languages for automation and tooling

  • Experience with configuration management and automation tools such as Ansible and Chef

  • Experience with infrastructure-as-code and cloud orchestration tools such as Terraform

  • Familiarity with observability and telemetry platforms such as Oracle Enterprise Manager, Azure Log Analytics, or similar

  • Experience with containerization and orchestration technologies such as Docker and Kubernetes

  • Experience with Microsoft SQL Server and IIS is a plus

  • Strong understanding of distributed systems fundamentals, including data structures, relational and non-relational databases, networking, Linux internals, filesystems, storage systems and  web architectures

  • Experience working with a broad range of open-source technologies and cloud services

  • Solid understanding of Agile development methodologies and modern software development practices

  • Strong interpersonal and communication skills, with the ability to collaborate effectively across diverse, cross-functional teams

Suggested Skills:

  • Technical Leadership

  • People Management

  • Stakeholder Management

Additional Information

You will Benefit from our Culture

We strongly believe in the well-being of our employees and their families. That is why we offer generous health and wellness programs and time away for employees of all levels

India Disability Policy 

LinkedIn is an equal employment opportunity employer offering opportunities to all job seekers, including individuals with disabilities. For more information on our equal opportunity policy, please visit https://legal.linkedin.com/content/dam/legal/Policy_India_EqualOppPWD_9-12-2023.pdf

Global Data Privacy Notice for Job Candidates ​

Please follow this link to access the document that provides transparency around the way in which LinkedIn handles personal data of employees and job applicants: https://legal.linkedin.com/candidate-portal.

Privacy Notice