Site Reliability Engineer

Job Description

Your deliverables as a Site Reliability Engineer will include, but are not limited to, the following:

Work with containers and container orchestration systems such as Kubernetes
Capacity Planning to determine resource requirements of your service for it to be scalable, efficient, and reliable
Identify and troubleshoot any availability and performance issues at multiple layers of deployment, from hardware to operating environment, network, and application
Collaborate with other engineers to implement operational solutions while defining and adhering to industry best practices
Participate in weekly on-call rotation

3/5+ years of experience in Cloud Operations
Proficiency with Infrastructure as Code technologies such as Terraform, CloudFormation, or ARM
Experience developing and deploying resources with a cloud provider (I.e., Azure, AWS, Cloudflare, GCP)
Networking concepts (load balancing, TCP/IP, HTTP, gRPC, DNS) and troubleshooting tools (Wireshark, command line, BPF)
Experience with version control systems (GitHub, Gitlab, Bitbucket)
Comfortable with scripting languages like Python, Bash and Go
Familiarity with container technologies like Docker and Kubernetes
Knowledge of Cloud-native architecture, Cloud tooling and the latest trends and practices
Appropriate Linux, Kubernetes & Cloud Certifications a plus