Staff Site Reliability Engineer
- Full-time
Company Description
Twitter is what’s happening and what people are talking about right now. For us, life's not about a job, it's about purpose. We believe real change starts with conversation. Here, your voice matters. Come as you are and together we'll do what's right (not what's easy) to serve the public conversation.
Job Description
Who We Are:
Twitter developed and continually improved a large-scale storage platform. SREs, ensure the availability of the environment, with a watchful eye on security, capacity, and performance. This group writes software to improve service reliability and manage platform growth. Our tools and services reduce operational overhead and maximize performance.
What You’ll do:
As a Site Reliability Engineer (SRE) in Twitter’s Storage Infrastructure team, you will work to improve the reliability and performance of the next generation of distributed systems and containerized deployments. This team ensures the availability of in-memory data services (including Redis and Memcached), and caching content from foundation storage platforms. You will partner with product engineering teams to design, build, operate, and automate distributed storage services at the heart of Twitter’s infrastructure used by millions of people.
We are looking for software engineers passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production services.
This team has some exciting challenges approaching. Services need to adopt IPv6, develop a unified caching solution, transition into Kubernetes, and reimagine elasticity. Opportunities exist for team members to influence how Twitter leverages future caching infrastructure. Work directly with most Twitter engineering teams to improve their caching services interactions.
Responsibilities:
Build tooling to improve operations automation. This includes automatic failure detection and remediation, application deployment, OS/kernel deployment, capacity planning, and fleet management.
Diagnose and troubleshoot complex distributed systems handling millions of queries per second, and petabytes of data, and develop solutions that significantly impact our massive scale.
Collaborate with software engineers to sustain and optimize service availability, reliability, and performance.
Work and collaborate with the company's diverse hardware, software, and networking teams to design next-generation distributed storage platforms.
Troubleshoot issues across the entire stack - hardware, software, application, and network.
Produce results for large-scale projects and lead active collaboration across multiple teams.
Scope work for multiple engineers, often across multiple teams.
Sustain data privacy and service security compliance.
Participate in a 24x7 on-call rotation.
Qualifications
B.S.+ in Computer Science or related field (or equivalent experience)
5+ years of managing services in a distributed, internet-scale *nix environment.
Ability to program scalable and reliable services in at least one programming language (Python, Go, Java, C). Can set standards for code quality.
Demonstrable knowledge of Linux operating system internals and TCP/IP networking; containerization a plus.
Familiarity with systems management tools (Puppet, Chef, Ansible, etc).
Ability to prioritize tasks and work independently.
Track record of practical problem solving, excellent communication, and documentation skills.
Additional Information
A few other things we value:
Challenge: We solve some of the industry’s most complex problems. Come to be challenged, learn, and thrive as an engineer.
Diversity: We value diverse backgrounds, ideas, and experiences, all of which contribute to team and organization improvement.
Work-Life Balance - We honor team members’ work-life balance. All your information will be kept confidential according to EEO guidelines.
We are committed to an inclusive and diverse Twitter. Twitter is an equal-opportunity employer. We do not discriminate based on race, ethnicity, color, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status, or any other legally protected status.