Site Reliability Engineer, Data Infrastructure
- San Francisco, CA
Optimizely is the world’s leading experience optimization platform, providing website and mobile A/B testing and personalization for the world’s leading brands. The platform’s ease of use and speed of deployment empowers organizations to conceive of and run experiments that help them make better data-inspired decisions. Optimizely meets the diverse needs of thousands of customers worldwide looking to deliver connected experiences to their audiences across channels. To date, those customers have created and delivered more than 700 billion optimized visitor experiences.
At Optimizely, the Distributed Systems platform is the foundation driving our Experimentation and Personalization products, providing the information that our customers need to understand their experiment results. The experiment data we gather allows us to provide counting, analytics, targeting, and recommendations in our products. We process many billions of events a day, and we’ve built a sophisticated data pipeline that can power a variety of queries for personalized experiences at massive scale.
We are looking for a Site Reliability Engineer that will work as an integral part of the Data Infrastructure team to drive operational excellence in our stateful infrastructure. You would have a rare opportunity to own, operate and scale critical production services on a cutting-edge technology stack.
This is a unique role that will require knowledge and expertise across multiple layers of the stack. You’ll truly enjoy the non-traditional challenges of scaling and operating mission-critical services that are built on cutting-edge technologies like Apache Samza and HBase with an engineering mindset.
- You are a hybrid Systems and Software Engineer that loves to build systems for solving repetitive tasks and workflows.
- You have a strong passion for solving operational problems in complex systems through the application of validated engineering practices.
- You believe that automation is a key component in keeping a large-scale system humming.
What you will be doing
- Work closely with Data Infrastructure and DevOps engineering teams to:
- Create a Reliability Engineering roadmap to ensure that our complex, large-scale systems, and services are highly available, performant, monitored, automated and designed to scale.
- Influence new Data services and features early on so they are designed with scale, operability and performance in mind.
- Deliver launch plans for major features and build the necessary infrastructure (staging environments, monitoring, alerting etc) that will support the launch with operational run-books.
- Drive the team through “Disaster Recovery Tests” where we will manually turn down pieces of infrastructure and services to test Optimizely’s overall resiliency to failures.
- Build tools and smart alerting that can discover failures/issues quickly, with the goal of automating response to non-exceptional service conditions.
- Plan for growth of our services. Build out ahead of increased capacity needs and forecast demand
- You have strong DevOps experience with Unix/Linux systems, including solid troubleshooting and problem-solving skills.
- Know your way around Unix/Linux command line tools.
- Strong interpersonal communication skills and ability to work well in a diverse, team-focused environment with other Engineers, Product Managers, etc.
- You have production experience on data centric applications in a web-scale environment.
- You’ve worked with Amazon Web Services (AWS).
- Nice to have: Hands-on experience with Open Data platforms like Apache Hadoop, HBase or Spark would be an advantage.
- Nice to have: You have experience with Java Virtual Machine (JVM) environments.
- Commuter and transportation benefits
- Catered in-office lunch and dinner on weekdays
- Full medical insurance with very low co-pay and deductible. HMO, PPO, and HSA options available
- Full dental coverage including orthodontics
- Full vision coverage including contacts
- Dependents 100% covered for medical, dental, and vision
- Wellness Grant
- Unlimited vacation policy and seventeen weeks of paid parental leave
- 401k benefit
- Working with a great team and having a huge impact!
All your information will be kept confidential according to EEO guidelines.