Senior Site Reliability Engineer (SRE)

  • Contract

Company Description

Company Overview:

We are collaborating with a large enterprise client seeking an experienced Senior Site Reliability Engineer for a contract position. The ideal candidate will focus on ensuring system reliability, scalability, and performance while working remotely with overlap in U.S. time zones.

Job Description

Job Title: Senior Site Reliability Engineer (SRE) - Contract Role
Location: Remote (Must have availability to overlap with U.S. time zones)
 

Key Responsibilities:

  • Identify and resolve complex bugs by working within the codebase and utilizing runbooks.
  • Write and maintain code to enhance system reliability, scalability, and performance.
  • Restart services and implement changes to the codebase as required.
  • Investigate complex system issues and develop effective resolutions.
  • Design and build fault-tolerant, scalable systems for high availability and performance.
  • Apply advanced methodologies like Design for Reliability (DFR), Failure Mode and Effects Analysis (FMEA), and Mean Time Between Failures (MTBF).
  • Develop and maintain reliability standards and documentation.

Qualifications

Required Skills and Experience:

  • Minimum of 5-7 years in Site Reliability Engineering or related fields.
  • Proven experience in designing and implementing fault-tolerant, scalable systems at an enterprise level.
  • Deep understanding of DFR, FMEA, MTBF, and other reliability methodologies.
  • Proficiency with tools such as DataDog, PagerDuty, Marvin, Backstage, pipeline deployment processes, and rollback procedures.
  • Strong coding skills in one or more programming languages commonly used in SRE.
  • Exceptional analytical skills to investigate complex issues and devise effective solutions.
  • Willingness to learn new products and tools provided by the company.
  • Excellent communication skills and ability to work effectively within a distributed team environment.
  • Must be able to work remotely with significant overlap during U.S. time zones.


Preferred Qualifications:

  • Experience with runbooks and operational excellence methodologies.
  • Familiarity with large-scale enterprise systems and environments.
  • Relevant certifications in reliability engineering, cloud platforms, or related technologies.
Privacy Policy