Infrastructure Engineer

  • Salt Lake City, UT
  • Full-time

Company Description

WildWorks creates interactive entertainment for kids that’s fun, substantive, and beautiful. Founded in 2003 by game industry veterans (originally as “Smart Bomb Interactive”), WildWorks builds and publishes blockbuster mobile, social and online games; including the #1 online playground for kids in the U.S., Animal Jam. Through a collaborative association with the National Geographic Society, Animal Jam incorporates rich factual content about science and animals, and provides free learning resources for teachers and parents through its “AJ Academy” portal. WildWorks is headquartered in Salt Lake City, Utah, with a satellite office in Amsterdam, Netherlands. The WildWorks team of 120 includes the best multimedia artists, engineers, designers, safety experts, and community support staff now working in interactive media.

If you love making things millions of kids will enjoy and you want to work on a revolutionary gaming project, this is an opportunity to join a highly collaborative team and create the future of Animal Jam, already the #1 MMO playground for kids. Animal Jam is developed at our studio in downtown Salt Lake City, Utah.

Job Description

WildWorks is looking for a Site Reliability Engineer to maintain and build upon the existing server infrastructure and continuous delivery pipelines for our applications. You'll get to work with some cutting-edge technologies like Kubernetes, Docker, and AWS, and will be responsible for working with developers to quickly understand their needs and put together infrastructure to support deploying and scaling their application. You'll also own the automated build/deployment platform and will become the subject-matter expert on supporting and building continuous delivery pipelines.

This position requires a deep understanding of how all layers of an internet-based service work together. You'll need a good understanding of networking, security, DNS, Linux, databases, proxies, and protocols. We use a broad range of technologies, so resourcefulness is a huge asset. You'll serve on an on-call rotation and will be responsible for diagnosing the root cause of any issues and forming solutions that either prevent the root cause or automate the recovery of the issue. You'll also work with developers to identify useful and actionable metrics to measure, then set up monitoring and alerting based on those metrics.

RESPONSIBILITIES

  • Provide support, guidance, and training to developers for maintenance and deployment of Kubernetes manifests, Dockerfiles, and Jenkinsfiles.
  • Plan and execute maintenance, upgrades, and migrations in Dev, Stage, and Prod in a way that avoids downtime and service interruptions.
  • Identify and remedy single points of failure and security risks. Continuously improve self-service tools and processes to reduce cycle times for developers and automate repetitive and wasteful operations.
  • Maintain and improve shared Docker base images, deployment scripts, and service templates.
  • Manage databases, caching servers, message queues, centralized logging, etc. including Riak, AWS Aurora, MySQL, MongoDB, ElasticSearch, RabbitMQ, and Redis.
  • Maintain tools and components of DevOps platform including Kubernetes, GitLab, Jenkins, and Fluentd. Interface with external CDN, logging, monitoring, and security vendors. Update infrastructure code in Terraform and system images in Packer when needed.
  • Identify and reduce waste and increase cost efficiency of infrastructure. Provide input on build vs. buy decisions and negotiate contracts with external vendors.
  • Configuration and tuning of CDN distributions, servers, databases, proxies, messaging queues, and cache servers. Setup of replication and failover for datastores with single points of failure.

Qualifications

  • Working knowledge of Linux command-line tools. Docker and Kubernetes are a plus.
  • General programming experience in any scripting language.
  • Experience working in a fast-moving high-traffic, high-uptime internet service environment.
  • Knowledge of AWS services, best practices, and capabilities. Familiarity with EC2 networking, instance type selection, and configuration.
  • Strong communication skills and a sense of ownership.
  • Prior experience working with microservices, cloud-native applications, and distributed systems is a big plus.

Additional Information

All your information will be kept confidential according to EEO guidelines.

This is a full-time, on-site position in our Salt Lake City studio.

No agents or third-party submissions, please.

Only candidates submitted through our career link will be considered.