Site Reliability Engineer-Automation/DevOps

  • Contract

Company Description

We at IBA InfoTech find the Hidden Talent across the globe. We connect caliber candidates with leading companies in contract, contract-to-hire and direct-hire positions in various industries - Oil & Gas, Energy, Telecommunications, Transportation, Business & Finance, Retail, Hospitality and Insurance.

Job Description

Role:                     SRE Automation Engineer

Location:             Remote

Rates:                   $65 hr/C2C

Duration:             6+ Month Contract

 

Bring your passion for cloud technologies and experience operating SaaS to influence how our development teams release quality products. While you manage the availability of our services to meet our service level targets, and drive fleet wide operations, you will also play a critical role in ensuring proper operational readiness is met by teams as new services and features are onboarded.

 

Responsibilities:

Conduct fleet wide operations against VMware software running in customer environments using orchestration and operational tools.

Maintain availability, reliability, performance of services based on SLA/SLI/SLO definitions.

Participate in a multi-geo on-call SRE rotation to field service incidents, operating in partnership with support and DevOps teams.

Conduct post-mortem of incidents to address any operational or service gaps.

Participate in live site meetings to review key service issues and to drive new learnings.

Expand alerting definitions to catch previously undetected issues.

Improve existing runbooks, extend auto-remediation ruleset, and implement remediation scripts.

Provide continual enhancements to tooling and automation to reduce operational toil.

Employing best practices to define and refine key processes/workflows to accelerate delivery of software without sacrificing quality and reliability.

Identify and drive operational improvements across SRE and DevOps teams.

 

Requirements:

Minimum of 5 years engineering experience

SRE/DevOps experience involving CI/CD, reliability, scale, monitoring, and live site culture

Solid understanding of cloud-based architectures and concepts

Proficiency developing for or operating within Linux environments

Scripting experience, preferably with Python

Software development experience in Java

Experience with containers and container orchestrators - Docker, Kubernetes

Good knowledge of TCP/IP networks including routing, firewalls, DNS, DHCP, VPN, etc.

Strong problem-solving, debugging, and troubleshooting skills Excellent communication and interpersonal skills

Driven, self-motivated individual

 

Role:                     SRE Engineer

Location:             Remote

Rates:                   $65 hr/C2C

Duration:             6+ Month Contract

 

Bring your passion for cloud technologies and experience operating SaaS to influence how our development teams release quality products. While you manage the availability of our services to meet our service level targets, and drive fleet wide operations, you will also play a critical role in ensuring proper operational readiness is met by teams as new services and features are onboarded.

 

Responsibilities:

Conduct fleet wide operations against VMware software running in customer environments using orchestration and operational tools.

Maintain availability, reliability, performance of services based on SLA/SLI/SLO definitions.

Participate in a multi-geo on-call SRE rotation to field service incidents, operating in partnership with support and DevOps teams.

Conduct post-mortem of incidents to address any operational or service gaps.

Participate in live site meetings to review key service issues and to drive new learnings.

Expand alerting definitions to catch previously undetected issues.

Improve existing runbooks, extend auto-remediation ruleset, and implement remediation scripts.

Provide continual enhancements to tooling and automation to reduce operational toil.

Employing best practices to define and refine key processes/workflows to accelerate delivery of software without sacrificing quality and reliability.

Identify and drive operational improvements across SRE and DevOps teams.

 

Requirements:

Minimum of 5 years engineering experience

SRE/DevOps experience involving CI/CD, reliability, scale, monitoring, and live site culture

Solid understanding of cloud-based architectures and concepts

Proficiency developing for or operating within Linux environments

Scripting experience, preferably with Python

Software development experience in Java

Experience with containers and container orchestrators - Docker, Kubernetes

Good knowledge of TCP/IP networks including routing, firewalls, DNS, DHCP, VPN, etc.

Strong problem-solving, debugging, and troubleshooting skills Excellent communication and interpersonal skills

Driven, self-motivated individual

Additional Information

All your information will be kept confidential according to EEO guidelines.