Staff Systems Engineer, Site Reliability

Full-time

Job Family Group: Technology and Operations

Company Description

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive.

When you join Visa, you join a culture of purpose and belonging – where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world – helping unlock financial access to enable the future of money movement.

Join Visa: A Network Working for Everyone.

Job Description

Full Job Description

The SRE (Site Reliability Engineering) team is responsible for availability, reliability, performance, monitoring, emergency response for applications, and reducing manual work by implementing SRE principles and practices. The SRE team directly works with Development teams, Operations teams, Product teams, and other teams to deploy new features, and changes, and maintain infrastructure, operations, CI/CD, and IAC to achieve availability and reliability so that SLOs and SLAs can be protected. We utilize a variety of DevOps automation tools like Ansible, Docker, Kubernetes, Terraform, and Jenkins. The Senior SRE engineer is capable of implementing Observability, SLO, SLI, SLA, and Disaster Recovery and Backup Plans.

Responsibilities:

Design, engineer and implement large scale distributed systems that process high volumes of observability tracing data from container and non-container-based applications focusing on latency, scalability, resiliency, self-service, and fault tolerance.
Design, develop and implement open source-based software components, libraries, and auto instrumentation code for enabling complete observability across application tracing, Metrics and Logs.
Ensure the availability and reliability of distributed systems.
Help Tier 1 team to resolve the client’s infrastructure/system issues, escalations, alerts, tickets, and queries.
Works as a bridge between development, operations and other teams in order to build and maintain resilient systems.
Conduct, coordinate and oversee post-incident Root Cause Analysis / Reviews.
Build and maintain documentation for all assigned projects.
Leverage DevOps, Agile methodology, ITIL disciplines (Event, Incident, Problem, and Change Management) and standards in day-to-day work.
Adopt and propose automation of repetitive tasks to reduce/eliminate toil.
Implement and troubleshoot using observability tools like Prometheus, Grafana etc.
Planning and implementing disaster recovery and backup plans for the platform.
Proactively work on efficiency and capacity planning.
Keep a proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
Liaise and work closely with Tier-1 On call support, Development, and Operations teams.
Drive availability and reliability by defining and implementing SLI, SLO, error budget, Observability, Disaster recovery, and backup to detect and mitigate issues.
Work independently and mentor junior developers.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

Qualifications

Basic Qualifications:
• 5 or more years of relevant work experience with a Bachelors Degree or at least 3 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD
• Hands on experience on how to monitor software and Infrastructure and its related tools such as Prometheus, Grafana, Splunk and ELK.
• Live in terminal and ability to script/debug in Shell/PowerShell.
• Experience with Linux and Windows administration.
• Working knowledge of relational and non-relational databases, including creating and running queries (ex. MSSQL and MongoDB).
• Working knowledge of web/middleware servers like Tomcat and IIS.
• Complete understanding of the software development life cycle and agile software development methodologies.

Preferred Qualifications:
• 6 or more years of work experience with a Bachelors Degree or 4 or more years of relevant experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or up to 3 years of relevant experience with a PhD
• Hands on experience with container related technologies such as Docker, Kubernetes.
• GIT experience highly desired and experience in SCM (Source Control Management) tools.
• Experience with Repository Management systems like Artifactory, Nexus.
• Experience with CI/CD and hands on implementation experience (ex: Jenkins)
• Experience in automating release and build processes.
• Experience with Jira and Confluence.

Additional Information

Work Hours: Varies upon the needs of the department.

Travel Requirements: This position requires travel 5-10% of the time.

Mental/Physical Requirements: This position will be performed in an office setting. The position will require the incumbent to sit and stand at a desk, communicate in person and by telephone, frequently operate standard office equipment, such as telephones and computers.

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Visa will consider for employment qualified applicants with criminal histories in a manner consistent with applicable local law, including the requirements of Article 49 of the San Francisco Police Code.

U.S. APPLICANTS ONLY: The estimated salary range for a new hire into this position is 117,700.00 to 153,000.00 USD, which may include potential sales incentive payments (if applicable). Salary may vary depending on job-related factors which may include knowledge, skills, experience, and location. In addition, this position may be eligible for bonus and equity. Visa has a comprehensive benefits package for which this position may be eligible that includes Medical, Dental, Vision, 401 (k), FSA/HSA, Life Insurance, Paid Time Off, and Wellness Program.