Site Reliability Engineer (JR1028237) (Contract)
- 125 High St, Boston, MA
- Employees can work remotely
Are you looking for a position with a growing company? Broadridge is hiring! We’re seeking a Site Reliability Engineer to join our team and work within our Technology Organization. As a member of a small cross functional team, you will own a particular infrastructure challenge! Design and document systems, including writing and reviewing code, to automate away problems. Undertake measured, methodical, troubleshooting of complicated systems under pressure. Partake in an on-call rotation alongside the engineers who build our production backend.
Technology Management; Participate in defining microservices infrastructure from inception and design, through deployment, operation and continuous refinement. Support services before they go live through activities such as system design, deployment automation, capacity planning and launch reviews.
Service Management; Align with and directly service internal business units and external clients. Ability to directly interact with business unit executives and project management to communicate operational status, key project status, and business value of services produced. Supervise record of success in a matrix organization.
SLA Management; Ensure SLA and operational efficiency are achieved for products managed by third-party data center provider and internal support teams using resources and leadership within these organizations assigned to support these products. Pro-actively address issues to avoid any problems reoccurring without crafting undue layers of process or complexity.
Risk Management; Maintain a strong IT control environment, responsive to the risks across all aspects of technology.
Strategy; Work with the cross-functional teams to ensure the needs of infrastructure are prioritized and aligned with both enterprise and business plans. Build and maintain alignment with the business technology architecture and overall strategic goals and objectives.
Disaster Recovery and BCP; Contribute as required to the design and execution of the enterprise-wide recovery and business continuity plan. Plan and execute DR test events to meet internal and external obligations
- Minimum 5-7 years experience in a role supporting cloud based solutions or as an SRE
- Bachelor's degree in Computer Science or equivalent practical experience
- Experience in one or more of the following: Python, Go, Perl, or shell scripting
- Experience with Unix/Linux operating systems internals and administration
- Experience with GitLab and Jenkins
- Data analysis techniques that can include:
- Reliability modeling and prediction
- Fault Tree Analysis
- Root-cause and Root-Cause Failure Analysis
- Failure Reporting, Analysis and Corrective Action Systems
- Extensive experience with Clouds, Kubernetes and Docker
- Strong interpersonal skills that enable the individual to appropriately connect with technical and non-technical teams