Senior Manager, Site Reliability
- Full-time
- Workplace Type: Hybrid
- Career Track & Grade: MR4/10
- Department: Engineering
Company Description
LinkedIn is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We’re also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture that’s built on trust, care, inclusion, and fun – where everyone can succeed.
Join us to transform the way the world works.
Job Description
This role will be based in Bangalore, India.
At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team.
At LinkedIn, the Productivity Engineering Site Reliability Engineering (SRE) team plays a critical role in ensuring our enterprise business applications are reliable, scalable, secure, and highly automated.
We are seeking a Senior Manager, Site Reliability Engineering to lead a high-performing team of SREs, software engineers, enterprise engineers, and test automation engineers responsible for system health, observability, and operational excellence across both development and production environments.
In this role, you will partner closely with Development and Test Automation teams from early design through production, driving improvements in reliability, performance, and scalability across complex application ecosystems. You will also collaborate with cross-functional infrastructure teams to scale and modernize financial systems infrastructure.
You will lead strategic initiatives across application, database, and middleware platforms, including performance optimization and the transformation of systems from on-premises environments to modern multi-cloud architectures.
This is a key leadership opportunity for someone passionate about building high-performing teams, driving automation at scale, and delivering resilient, efficient platforms that power mission-critical business operations.
Responsibilities:
Build, lead, and scale a high-performing SRE organization, including hiring, mentoring, and organizational development
Act as a role model and coach with a strong bias for action, engineering craftsmanship, and operational excellence
Participate with senior leadership to define and drive the long-term technology vision, strategy, and roadmap aligned with business priorities
Establish and foster a culture of ownership, accountability, continuous improvement, and high operational standards
Collaborate closely with cross-functional partners across development, infrastructure, testing, and business teams to drive impactful roadmaps
Influence and align senior stakeholders across engineering, infrastructure, and business domains
Own availability, reliability, performance, and scalability of enterprise business applications and financial systems
Define and implement SRE best practices including SLOs, SLAs, error budgets, incident management, and operational frameworks
Lead end-to-end incident response, root cause analysis, and long-term remediation strategies to improve system resilience
Drive operational maturity through metrics, observability, automation, and continuous improvement initiatives
Oversee application, database, and middleware platform performance, reliability, and capacity planning
Lead modernization efforts including migration from legacy environments to modern infrastructure.
Evaluate and implement new technologies and architectural patterns to improve scalability, resilience, and efficiency
Define and drive observability strategy across monitoring, logging, tracing, and alerting systems
Champion an automation-first mindset to eliminate manual processes and improve operational efficiency
Drive development of internal tools and self-service platforms to enhance engineering productivity and reduce operational overhead
Improve deployment, release, and operational workflows through engineering-led automation and standardization
Own infrastructure cost management, capacity planning, and financial forecasting for Financial Systems
Optimize infrastructure and licensing investments (e.g., Oracle ecosystem) aligned with business and financial goals
Qualifications
Basic Qualifications:
BA/BS degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
12+ years of experience in Site Reliability Engineering, Production Engineering, or related disciplines
4+ years of experience leading and scaling high-performing engineering teams
Experience in Oracle ERP (EBS/Fusion) and Oracle Database technologies
Hands-on experience with Oracle 19c database administration, including high availability and disaster recovery (Oracle RAC, Grid Infrastructure, Data Guard)
Experience operating in SOX-compliant environments with focus on controls, audit, and governance
Preferred Qualifications:
4+ years of hands-on experience troubleshooting complex issues across Unix/Linux, networking, and Windows environments
Proven experience working within an SRE organization, managing and operating large-scale production systems and applications
Strong understanding and practical application of SRE principles, including designing systems aligned with SLA/SLO/SLI objectives and building resilient, highly available platforms
2+ years of programming experience in Python, Shell scripting, or similar languages for automation and tooling
Experience with configuration management and automation tools such as Ansible and Chef
Experience with infrastructure-as-code and cloud orchestration tools such as Terraform
Familiarity with observability and telemetry platforms such as Oracle Enterprise Manager, Azure Log Analytics, or similar
Experience with containerization and orchestration technologies such as Docker and Kubernetes
Experience with Microsoft SQL Server and IIS is a plus
Strong understanding of distributed systems fundamentals, including data structures, relational and non-relational databases, networking, Linux internals, filesystems, storage systems and web architectures
Experience working with a broad range of open-source technologies and cloud services
Solid understanding of Agile development methodologies and modern software development practices
Strong interpersonal and communication skills, with the ability to collaborate effectively across diverse, cross-functional teams
Suggested Skills:
Technical Leadership
People Management
Stakeholder Management
Additional Information
You will Benefit from our Culture
We strongly believe in the well-being of our employees and their families. That is why we offer generous health and wellness programs and time away for employees of all levels
India Disability Policy
LinkedIn is an equal employment opportunity employer offering opportunities to all job seekers, including individuals with disabilities. For more information on our equal opportunity policy, please visit https://legal.linkedin.com/content/dam/legal/Policy_India_EqualOppPWD_9-12-2023.pdf
Global Data Privacy Notice for Job Candidates
Please follow this link to access the document that provides transparency around the way in which LinkedIn handles personal data of employees and job applicants: https://legal.linkedin.com/candidate-portal.