Sr. Site Reliability Engineer
- Full-time
- Job Family Group: Technology and Operations
Company Description
Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network, enabling individuals, businesses, and economies to thrive while driven by a common purpose – to uplift everyone, everywhere by being the best way to pay and be paid.
Make an impact with a purpose-driven industry leader. Join us today and experience Life at Visa.
Job Description
What we’re looking for
We are looking for an experienced Sr. SRE/DevOps with proven experience on Google Cloud (GCP) and Kubernetes engines, also building high volume, high performance, and highly available payment services solutions to help us build functional systems that improve customer experience. You will be responsible for managing people at Platform team, architecting the cloud infrastructure, automating, and modernizing the CI/CD flows, deploying new product and their updates, identifying production issues, and implementing integrations that meet our customers' needs.
Your performance will be measured based on improved deployment process/times, reduction in server alarms and more efficient operation of resources. The goal is that our app traffic scales without an increase in error reports and maintaining/improving our error rate, response times, uptime. We need you to have strong scripting skills.
The Role
The Sr. SRE/DevOps will focus on:
- Leverage Google Cloud Platform and Kubernetes to enable our platforms to our clients.
- Improve and maintain the infrastructure behind our continuous integration and delivery pipelines.
- Work close with business areas to understand how the products are built, designed and operated and the importance of them for the company.
- Discover, analyze, and troubleshoot anomalous application behaviors. Deploy monitoring and infrastructure tools exposing metrics and alerts.
- Designing and implementing cloud native solutions and cloudware to support the platforms and applications running on top of it.
- Designing and developing automation to support continuous delivery and continuous integration processes.
- Building and setting up new development and CI/CD tools and infrastructure.
- Understanding the needs of stakeholders and conveying this to developers.
- Working on ways to automate and improve development and release processes.
- Testing and examining code written by others and analyzing results.
- Ensuring that systems are safe and secure against cybersecurity threats.
- Identifying technical problems and developing software updates and fixes.
- Planning out projects and being involved in project management decisions.
Duties
- Set-up up new sites and applications via configuration management
- Maintain / upgrade / patch tracking and documentation software
- Support the development lifecycle of platform architectural design, deployment and debugging
- Ability to automate release deployments across development, QA and production stacks using a combination of scripting languages and other automation toolkits
- Deploy updates and fixes.
- Provide Level 2 and 3 technical support.
- Build tools to reduce occurrences of errors and improve customer experience.
- Develop automation solutions to improve cloudware support.
- Perform root cause analysis for production errors.
- Investigate and resolve technical issues.
- Develop scripts to automate visualization.
- Design procedures for system troubleshooting and maintenance
Qualifications
- Basic Qualifications
-2 or more years of work experience with a Bachelor’s Degree or an Advanced Degree (e.g. Masters, MBA, JD, MD, or PhD) - Preferred Qualifications
-3 or more years of work experience with a Bachelor’s Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) - Strong knowledge in GCP architecture, security, and services.
- Knowledge in AWS and Azure architecture, security, and services (Desired).
- Experience in automation of code deployment across multiple platforms.
- Strong knowledge in Linux and Linux environments.
- Engineering experience in building production infrastructure using code and repeatable designs.
- Experience developing automation infrastructure provisioning tools (IaC. e.g., Pulumi, TerraForm).
- Experience with Docker containers and orchestration platforms such as ECS, EKS, GKE, and/or Swarm.
- Knowledge of ITIL, SOC 2 and PCI processes and experience evolving to Agile development lifecycles.
- Knowledge of networking, balancing and high availability.
- Ability to articulate complex architectures to non-technical audiences.
- Ability to document solutions and train operational teams on supportability.
- Experience with GCP Spanner, MySQL, and PostgreSQL.
- Experience with Database Administration, schema design, query analysis, performance, tuning, and optimization.
- Knowledge of security practices, networking protocols, firewalls, PCI compliance, etc.
- 5+ years system architecture required with a demonstrated ability to read code and understand coding logic to assist in troubleshooting
- 5+ years of experience supporting mission critical production application
Skills
- Proficient with git and git workflows.
- Good knowledge of Go and/or Python.
- Working knowledge of databases and SQL.
- Problem-solving attitude.
- Collaborative team spirit
Additional Information
Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.