Sr. Director - Site Reliability Engineering

  • Full-time
  • Job Family Group: Technology and Operations

Company Description

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network, enabling individuals, businesses, and economies to thrive while driven by a common purpose – to uplift everyone, everywhere by being the best way to pay and be paid.

Make an impact with a purpose-driven industry leader. Join us today and experience Life at Visa.

Job Description

Visa’s Technology Organization is a community of problem solvers and innovators reshaping the future of commerce.   We operate the world’s most sophisticated processing networks capable of handling more than 65k secure transactions a second across 80M merchants, 15k Financial Institutions, and billions of everyday people.   While working with us you’ll get to work on complex distributed systems and solve massive scale problems centered on new payment flows, business and data solutions, cyber security, and B2C platforms.   

 

The Opportunity:

As a Technology Leader in the Product Reliability Engineering (PRE) division at Visa, you will be part of our Global PRE team to help design, enhance, and build our highly available, highly secure, scalable, and resilient infrastructure in an agile environment. You will work with colleagues who will support and challenge you daily. You will take leadership roles working on multiple projects to ensure the reliability and performance of our services, RESTful APIs, container-based distributed systems, and cloud services.

You will spearhead the enhancement of our operational practices, focusing on efficiency and excellence, and lead our migration to cloud infrastructure. You will drive innovation that makes our infrastructure stand apart from our competitors and delight the customer with accelerated time-to-market delivery.

You will also champion the adoption of Generative AI (Gen AI) to drive operational efficiencies, automate routine tasks, and enhance predictive maintenance capabilities, thereby ensuring that our infrastructure is always ahead of potential issues and optimized for performance.

You are a trusted technology advisor and thought leader working with architects, development leads, product managers, security architects, and other partners across the organization, ensuring that what we build is secure, scalable, performant, and reliable. Every technology and software you have touched in the past, you have become an SME with a deep level of understanding of how it worked and are able to incorporate this in the most optimized form.

 

The Work itself:

  • Lead and Manage global teams of PRE engineers, supporting Risk and Identity Solutions products and services.

  • Apply knowledge of production support processes such as incident/change/problem management, call triaging, and escalation procedures to swiftly resolve live production issues.

  • Support critical applications under the Risk and Identity Solutions Portfolio for Value added services, ensuring stability through proactive maintenance activities, automation, and performing root cause analysis and remediation.

  • Ensure Visa's applications and systems are operational 24x7.

  • Lead adoption and migration to cloud infrastructure, ensuring seamless transition and minimal downtime.

  • Spearhead the enhancement of operational practices focusing on efficiency, security, and excellence.

  • Champion the adoption of Generative AI to drive operational efficiencies, automate routine tasks, and enhance predictive maintenance capabilities.

  • Work closely with architects, development leads, test engineering leads, TPMs, product managers, security architects, and other partners to ensure secure, scalable, performant, and reliable solutions.

  • Provide mentorship and leadership to the SRE team, fostering a culture of collaboration and continuous improvement.

  • Solve complex technical problems and optimize existing systems.

  • Ensure all infrastructure complies with the highest security standards and regulations.

  • Manage large-scale infrastructure projects, prioritizing tasks and ensuring timely delivery.

  • Focus on continuous integration, continuous delivery, and continuous improvement to accelerate time-to-market delivery and enhance overall system quality.

  • Candidate must have 10+ years of experience leading projects/key technical initiatives, and 12+ years of application support experience, configuring, and troubleshooting software applications, systems, databases, and associated devices. Self-motivation and excellent interpersonal communication skills are essential.

By fulfilling these responsibilities, the leader will ensure the PRE team delivers a robust, secure, and efficient infrastructure that supports the value-added services of Risk and Identity Solutions, meeting the evolving needs of the organization.

 

The Opportunity:

A leader for this role should bring a comprehensive set of skills, including deep expertise in Site Reliability Engineering (SRE) principles and practices, cloud platforms (AWS, Azure, Google Cloud), and security protocols. They should have experience in cloud migration, automation tools, and applying Generative AI for operational efficiencies. Familiarity with containerization technologies (Docker, Kubernetes), observability tools, and big data technologies like Hadoop, as well as distributed caching systems, is essential.

Strong leadership skills are crucial, including strategic thinking, team mentorship, and fostering a culture of collaboration and continuous improvement. Excellent communication skills are necessary to effectively collaborate with cross-functional teams and convey complex technical concepts to non-technical stakeholders. The leader should also possess problem-solving abilities, drive innovation, manage large-scale projects, and prioritize tasks efficiently. Additionally, the leader should have worked in fast-paced 24x7 environments, demonstrating adaptability, empathy, and confident decision-making to address the evolving needs of the team and the organization. Experience in swiftly resolving live production issues is essential to maintain system reliability and performance.

 

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

 

Qualifications

Basic Qualifications:-

  • 20+ years of relevant work experience and a Bachelors degree, OR 18+ years of relevant work experience


Preferred Qualifications:-

  • 20 or more years of experience with a Bachelor’s Degree or 18 years of experience with an Advanced Degree (e.g. Masters, MBA, JD, or MD), PhD with 14+ years of experience
  • 5 or more years of experience leading the projects and key technical initiatives.
  • 10 or more years of experience with JAVA, J2EE applications and/or .NET Core/.Net applications.
  • 10 or more years of experience in relational database usage with commercial-grade databases.
  • 5 or more years of experience managing applications on the Cloud (AWS, GCP, Azure)
  • 5 or more years leading and building Site Reliability teams, demonstrating strong leadership skills, strategic vision, and a track record of successfully managing and scaling SRE functions
  • Prior experience working in 24*7 environments.
  • Working Knowledge of GenAI capabilities and use cases
  • Prior experience with building tools to automate production support activities that enable efficiency and productivity of Service desk and other operations groups.
  • Must poses exceptional analytical, problem solving skills, oral and written communication skills.
  • Deep understanding of SOA principles and Web Services technologies: REST & SOAP.
  • Demonstrated proficiency in troubleshooting, root-cause analysis, application design, and implementing major components for large projects.
  • Knowledge of monitoring tools, alert escalation, customer / vendor management etc.
  • Strong work ethic, self-starter, ability to work in fast-paced, team-oriented environment.

Additional Information

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Privacy Policy