Sr. Engineer, SRE and Confluence
- Mountain View, CA
At Atlassian, we're changing the way we manage our hosted services. Site Reliability Engineering is the approach we are taking to deliver on our World Class SaaS promise to our customers.
As a Sr. Engineer in the Site Reliability team for Confluence, you’ll build solutions to enhance the availability, performance and stability of Atlassian's Confluence Cloud offering as well as automating away repetitive work. You will work very closely with the product architects and developers and be expected to contribute to the product development efforts. You'll also respond to alerts, pings, pages, queries and problems that you can investigate and really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation. The best person for this role is someone that has a collaborative spirit - in our world, it’s not about being a hero and having all the answers, it’s about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption. The team needs someone who can collaborate with other teams, ask questions, learn from others and turn chaos into order.
This role would be a great fit for someone with creative and innovative problem solving skills. You will not only identify problems, but also develop and implement solutions that operate at scale - seeing your own technology efforts directly improve the reliability of our products. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers. You will own development efforts in each and every sprint from planning to delivery to realise this goal and collaborate with different teams to review code.
One thing we promise: you’ll never be bored.
More about you
On your first day, we'll expect you to have:
- Software development experience with Java
- Experience in automation - we don't want you doing repetitive work! Python is ideal.
- Deep understanding of Linux systems.
- Serious troubleshooting skills across different levels of the stack.
- Hands-on experience with AWS.
- Deep expertise in Monitoring distributed systems application architectures.
- Diagnosing and resolving problems in high-throughput web applications and network services.
- Solid communication skills with team members near and far.
It's great, but not required, if you have:
- Experience with container management and micro-services architectures such as Docker.
- Building, automating, and maintaining infrastructure in Amazon Web Services.
- Experience monitoring cloud services with DataDog.
- Understanding of incident management process.
- Awareness and insight into industry trends (technology, methods and tooling).
- Experience working with Atlassian products such as Jira and Confluence.
- Management and troubleshooting of a continuous integration pipeline.
- Experience leading teams of engineers in service outage situations.
More about our team
Atlassian Site Reliability Engineering is a rapidly growing group within the organization. We are in the process of building our teams, tools and systems as part of Atlassian's mission to build the best SaaS services in the world. This is a truly exciting team to join - we are currently or are planning to be involved with every technical team across Atlassian.
We enable Atlassian to go fast by providing real-time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance.
We live the company values with a strong customer focus and possess a healthy sense of urgency. We are a heavily data-driven team, utilising a variety of data collection, enrichment, analytics and visualisations to learn about our complex systems.
We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on and bash out code to support the team, we have a spot for you too.
More about our benefits
Our offices are open, highly collaborative and yes, fun! To support you at work (and play) we offer some fantastic perks: ample time off to relax and recharge, five paid volunteer days a year for your favorite cause, plenty of food and beverages, ergonomic workstations with sit/stand desks, unique ShipIt days, a company paid trip after five years, generous employer-paid insurance coverage (medical, dental, and vision) for you and your family, 401k matching and more.
More about Atlassian
Software is changing the world, and we’re at the center of it all. With a customer list that reads like a who's who in tech, and a highly disruptive business model, we’re advancing the art of team collaboration with products like Jira, Confluence, Bitbucket, Trello, and now Stride. Driven by honest values, an amazing culture, and consistent revenue growth, we’re out to unleash the potential of every team. From Amsterdam and Austin to Sydney and San Francisco, we’re looking for people who are powered by passion and eager to do the best work of their lives in a highly autonomous yet collaborative, no B.S. environment.
We believe that the unique contributions of all Atlassians is the driver of our success. To make sure that our products and culture continue to incorporate everyone's perspectives and experience we never discriminate on the basis of race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.
All your information will be kept confidential according to EEO guidelines.