Site Reliability Engineer

  • Chengdu, Sichuan, China
  • Full-time

Company Description

Ubisoft is composed of over 12,000 talented people located in 28 countries across the globe. With around 85% of its staff devoted to game development, Ubisoft has the 2nd largest in-house creative team in the world.

The company’s 27 different creative studios work hand-in-hand each day to deliver rich and innovative gaming experiences that reflect the creativity and diversity of their teams. This cross-studio collaboration model means every team member has the opportunity to participate in challenging projects based on iconic brands including Assassin's Creed®, Tom Clancy's Ghost Recon®, Tom Clancy's Splinter Cell®, Rabbids®, Rayman®, Watch Dogs®, Far Cry®, The Crew®, Just Dance® and more.

With an expansive global distribution network, Ubisoft is also a company that stays in close contact with its local fans.

Job Description

The Site Reliability Engineer (SRE) is responsible of Ops and development tasks such as level 4 support and the implementation of highly scalable Game infrastructure.  The SRE is working as the Infra services integrator that enables the production to build Games using principals of cloud-Native, DevOps and continuous Delivery. The SRE has a good development background with knowledge of infrastructure and automation.


The main and routine tasks of this position are to:

·        Designing and/or implementing a highly scalable Cloud and Bare Metal server and network infrastructure

·        Share responsibility and ownership of game functions and services with developers who create them

·        Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

·        Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.

·        Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.

·        Maintain services once they are live by measuring and monitoring availability, latency and overall system health.

·        Practice sustainable incident response and blameless postmortems.

·        Ability to debug and optimize code and automate routine tasks (“toil”)

·        Consulting on the game's software and data architecture to ensure maximum infrastructure scalability

·        Ensuring reliability and consistency of game data

·        Work with developers to develop adequate monitoring and monitor system events to ensure health, maximum system availability and service quality

·        Assist in evaluating new requirements, technical design and standards

·        Reduce the cost of failure for changes

·        Define prescriptive ways to measure reliability



"Here’s what you do when someone breaks something or finds something very difficult to debug: You say thank you. Thank you for finding this edge case. Thank you for highlighting this overcomplicated part of our system. Thank you for pointing out this gap in our docs. And then you go make it so nobody can break it the same way again."


A baccalaureate degree or equivalent experience in Computer Information Systems, Computer Science, Mathematics or a related field.

Relevant experience:

2+ years of experience with software development or 5+ years of automation focused system administration with Hybrid hosting solutions.

Experience in one or more of the following: C, C++, Java, Python, Go, Perl or Ruby.


·        Self-driven, be slightly paranoid about system stability

·        Be able to teach fundamental principles to other engineers/experts.

·        Skill in developing techniques and methodologies to resolve unprecedented problems or situations

·        Ability to make complex information accessible to non-technical people


·        In-depth knowledge of Linux system internals and operating system design

·        In-depth understanding of Public Cloud providers and Openstack platform

·        Proficient knowledge in orchestration systems such as Kubernetes

·        Proficient knowledge in relational database systems like MySQL

·        Proficient knowledge in document storage systems like MongoDB

·        Infrastructure orchestration with Terraform

·        In-depth understanding of Configuration Management systems like Saltstack, Chef & Puppet & Ansible is an asset

Additional Information

We have salaries to motivate you, bonuses for your performances, medical services to keep you safe and sound, meal tickets to use them wherever you want and free access to relaxing and fitness room.

But most of all, we guarantee you’ll enjoy our atmosphere and working environment.

Ubisoft is committed to creating an inclusive work environment that reflects the diversity of our player community. We are an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to their race, ethnicity, religion, gender, sexual orientation, age or disability status.

Privacy Policy