Site Reliability Engineer - Cloud

  • Orchard Rd, Singapore
  • Full-time

Company Description

A series B global Product development company

Job Description

Position: Site Reliability Engineer - Cloud

 

Job Description:

We are looking for an experienced DevOps / Site Reliability engineer for a Series B funded company in Singapore, who'll be instrumental in taking the products to the next level.

 

In this role, you will be working on bleeding edge hybrid cloud/on-premises infrastructure handing billions of events and terabytes of data a day.

You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.

As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring, and scaling of components and optimizing the infrastructure for cost and performance.

Day-to-day responsibilities

 

Ensure the operational integrity of the global infrastructure Design repeatable continuous integration and delivery systems Test and measure new methods, applications and frameworks Analyse and leverage various AWS-native functionality

Support and build out an on-premise data center footprint Provide support and diagnose issues to other teams related to our infrastructure Participate in 24/7 on-call rotation

 

Candidate’s Profile:

·        Expert-level administrator of Linux-based systems

·        Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.

·        Experience with production deployments of Kubernetes Cluster

·        Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.

·        Knowledge of monitoring platform (Prometheus, Grafana, Graphite).

·        Experience in Distributed storage systems such as Ceph or GlusterFS.

·        Experience in virtualisation with KVM, Ovirt and OpenStack.

·        Hands-on experience with configuration management systems such as Terraform and Ansible

·        Bash and Python Scripting Expertise

·        Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)

·        Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)

·        Experience managing hundreds to thousands of servers globally

·        Enjoy automating tasks, rather than repeating them

·        Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems

·        Strong verbal and written communication skills

·        Ability to adapt to a rapidly changing environment

·        Comfortable collaborating and supporting a diverse team of engineers

·        Ability to troubleshoot problems in complex systems

·        Flexible working hours and ability to participate in 24/7 on call support with other team members.

 

Note: 3 - 8 years of relevant experience with a Valid Singapore PR / a Singaporean can only apply (No sponsorship is available as per the Government Rule) 

Additional Information

Note: 3 - 8 years of relevant experience with a Valid Singapore PR / a Singaporean can only apply (No sponsorship is available as per the Government Rule)