Principal Site Reliability Engineer (CDSS Advanced URL Filtering)
- Full-time
- Department: DevOps
- Job Country: United States of America
Company Description
Our Mission
At Palo Alto Networks® everything starts and ends with our mission:
Being the cybersecurity partner of choice, protecting our digital way of life.
Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.
Who We Are
We take our mission of protecting the digital way of life seriously. We are relentless in protecting our customers and we believe that the unique ideas of every member of our team contributes to our collective success. Our values were crowdsourced by employees and are brought to life through each of us everyday - from disruptive innovation and collaboration, to execution. From showing up for each other with integrity to creating an environment where we all feel included.
As a member of our team, you will be shaping the future of cybersecurity. We work fast, value ongoing learning, and we respect each employee as a unique individual. Knowing we all have different needs, our development and personal wellbeing programs are designed to give you choice in how you are supported. This includes our FLEXBenefits wellbeing spending account with over 1,000 eligible items selected by employees, our mental and financial health resources, and our personalized learning opportunities - just to name a few!
At Palo Alto Networks, we believe in the power of collaboration and value in-person interactions. This is why our employees generally work full time from our office with flexibility offered where needed. This setup fosters casual conversations, problem-solving, and trusted relationships. Our goal is to create an environment where we all win with precision.
Job Description
Your Career
Palo Alto Networks operates a vast hybrid infrastructure and is among the largest GCP customers. As a Site Reliability Engineer on the CDSS Advanced URL Filtering team, you will play a key role in shaping the reliability and scalability of our systems. This position offers the opportunity to work on cutting-edge technologies, tackle complex challenges, and contribute to the success of innovative solutions that protect our customers.
This role is located at our dynamic Santa Clara California headquarters campus. This is not a remote position.
Your Impact
Optimize infrastructure costs by monitoring resource utilization, rightsizing instances, and reducing waste to improve cost-efficiency.
Define and manage service-level objectives (SLOs) and related metrics to ensure service reliability and align with business goals.
Design and maintain secure cloud infrastructure that prioritizes reliability, scalability, and efficiency.
Develop expertise in new technologies to enhance infrastructure and operations.
Collaborate with cross-functional teams to ensure applications are production-ready and highly available.
Automate deployments, monitoring, and alerting to streamline operations and improve reliability.
Diagnose and resolve critical issues, driving optimization and continuous improvement.
Participate in on-call rotations to support seamless service operations.
Contribute to design reviews to enhance system performance and scalability.
Qualifications
Your Experience
Creative thinker and collaborative team player with strong communication skills and a drive to make a meaningful impact.
Cloud and Infrastructure: Expertise in provisioning and managing cloud infrastructure on public or private cloud platforms (GCP, AWS, or Azure preferred), with strong proficiency in tools like Kubernetes, Terraform, and Ansible.
Database Operation: Proficiency in managing and optimizing SQL and NoSQL databases, including operational tasks such as provisioning, scaling, monitoring, backups, and troubleshooting. Experience with platforms like BigQuery, MongoDB, Cloud SQL, Firestore, Bigtable, and MySQL is preferred.
System Reliability: Deep understanding of distributed systems, high-availability architecture, and strategies for scaling and optimizing system performance.
Service-Level Management: Proven experience defining and managing SLAs, SLOs, and SLIs to ensure service reliability and business alignment.
Cost Optimization: Expertise in monitoring and optimizing cloud infrastructure costs, including resource allocation and implementing efficient practices.
Load Balancing and Networking: Hands-on experience with Envoy or similar load balancing technologies, along with strong Linux system administration and advanced network troubleshooting skills.
Automation and Development: Advanced skills in programming and automation using Python, Golang, or shell scripting to streamline operations and enhance system reliability.
Production Deployment and Best Practices: Proven experience managing production deployments, ensuring system stability, and enforcing DevOps best practices.
Monitoring and CI/CD: Familiarity with CI/CD pipelines (GitLab CI preferred) and expertise in designing robust monitoring and alerting systems.
Collaboration and Communication: Exceptional ability to work with cross-functional teams, communicate effectively, and provide technical leadership.
Mindset and Motivation: Self-disciplined, self-managed, and self-motivated, with a strong sense of ownership, urgency, and drive. Passionate about infrastructure and monitoring as code.
- Education and Experience: BS/MS in Computer Science, Computer Engineering, or a related field, with 8+ years of hands-on industry experience in Site Reliability Engineering or a similar role managing and improving complex systems at scale..
Additional Information
The Team
Our engineering team is at the core of our products – connected directly to the mission of preventing cyberattacks. We are constantly innovating – challenging the way we, and the industry, think about cybersecurity. Our engineers don’t shy away from building products to solve problems no one has pursued before.
We define the industry, instead of waiting for directions. We need individuals who feel comfortable in ambiguity, excited by the prospect of a challenge, and empowered by the unknown risks facing our everyday lives that are only enabled by a secure digital environment.and downtime.
Compensation Disclosure
The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between: $ 147000 - $ 210,000/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.
Our Commitment
We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.
We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at [email protected].
Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.
All your information will be kept confidential according to EEO guidelines.