Site Reliability Engineer, Distributed Systems

  • Austin, TX, USA
  • Full-time
  • Department: Engineering

Company Description

Optimizely is the world's leader in customer experience optimization, allowing businesses to dramatically drive up the value of their digital products, commerce and campaigns through its best in class experimentation software platform. By replacing digital guesswork with evidence-based results, Optimizely enables product and marketing professionals to accelerate innovation, lower the risk of new features, and drive up the return on investment from digital by up to 10X. Over 26 of the Fortune 100 companies choose Optimizely to power their global digital experiences. Optimizely’s impressive customer list includes eBay, FOX, IBM, The New York Times and many more global enterprises.

Job Description

Optimizely’s Site Reliability Engineers work on improving the availability, scalability, performance and reliability of our production data platform.  Our distributed event processing and compute platform powers the results and analytics for all of our Experimentation and Personalization products. This platform processes billions of events a day and is relied on by many Fortune 100 global businesses.

We value observability, monitoring, actionable alerting based on SLOs, blameless postmortems and efficient incident response. We work in both the application and systems worlds, instrumenting key parts of core architecture while supporting developers as they do the same.

 

What you’ll do

You will be part of a small team of SREs in Optimizely’s Austin, TX office.  As a member of the Data Infrastructure team your work will directly impact the reliability and performance of all of Optimizely’s products.

You will deep dive into gnarly operational issues within software deployments, operating systems, network I/O, and Linux processes.

You will also work on projects to move away from operational toil and towards improving fault tolerance, automation and SLO driven priorities.

Your responsibilities include but are not limited to:

  • Work closely with distributed systems engineers developing new scalable features and services within our data platform

  • Build and scale new infrastructure to meet demand

  • Document system design and procedures

  • Participate in production on-call rotation

  • Contribute to improvements of infra and application monitoring and alerting

  • Develop and improve disaster recovery procedures and automation

  • Work closely with security engineers to develop and improve network ACLs and tests

  • Engage in service capacity planning and demand forecasting



 

Qualifications

Who You Are

  • 5+ years running 24/7x365 production environments

  • 5+ years of building infrastructure with automation

  • Experience with Cloud Infrastructure (ideally AWS)

  • Deep understanding of Linux

  • You’ve worked with messaging and data storage systems like Kafka, HBase, Spark, Cassandra or similar.

  • Solid grasp of a modern programming language: Java, Python, Ruby, Go, Rust or similar.

  • Proficiency with configuration management and orchestration tools like puppet, chef, ansible or terraform.

  • Solid understanding of fundamental networking technologies and concepts.

  • Knowledge of best practices related to security, performance, and disaster recovery.

  • Experience with infrastructure monitoring, network design, high availability systems

  • Strong interpersonal communication skills and ability to work well in a diverse, team-focused environment with other Engineers, Product Managers, etc.

  • Minimum BA/BS degree in Computer Science, Engineering, or related degree

Additional Information

Perks:

  • Commuter and transportation benefits
  • Catered in-office lunch on weekdays
  • Full medical insurance with very low co-pay and deductible. HMO, PPO, and HSA options available
  • Full dental coverage including orthodontics
  • Full vision coverage including contacts
  • Dependents 100% covered for medical, dental, and vision
  • Wellness Grant
  • Unlimited vacation policy and seventeen weeks of paid parental leave
  • 401k benefit
  • Working with a great team and having a huge impact!

At Optimizely, we embody inclusion and embrace diversity.  Optimizely is an equal opportunity employer and makes employment decisions on the basis of merit.  Optimizely prohibits discrimination based on race, color, religion, sex, sexual identity, gender identity, marital status, veteran status, nationality, citizenship, age, disability, medical condition, pregnancy, or any other unlawful consideration. All your information will be kept confidential according to EEO guidelines.

Privacy Policy