Head of Engineering and Planning

Full-time

Job Type: Permanent

Company Description

Our name isn’t the only thing that’s unique about Leidos Australia. We’re a complex systems integration company building world-class solutions across government and defence that ensure peace of mind for the entire nation. Supported by global backing from our US network, we’re trusted by our customers to deliver the most innovative answers to their most complex challenges. Seriously interesting work that benefits and safeguards every Australian. That’s where you come in.

Job Description

Join a multi-year platform modernization effort as the Head of Engineering and Planning. Among many other duties, the team is focused on toil reduction, process improvement, configuration management automation, and gathering metrics for the company’s core service platform as part of the CTO team.

This is a technical based management position focusing on development, leadership and oversight of Engineers. The ideal candidate will possess a blend of technical and nontechnical skills with the ability to gather requirements, develop solutions to problems, build teams to implement those solutions, and teach others.

As the Head of Engineering and Planning, you will:

Lead the team of engineers running Capacity, Availability, ILS and Disaster Recovery
Work with leadership to transform the current team structure to remove single point sensitivity, and plan a transition to SRE based operating models
Assess opportunities to transition services and capabilities to the Cloud and execute as appropriate
Proactively plan for capacity, reliability, availability and sustainability of services.
Be creative to develop new services to by optimising existing processes and service product lines.
Work with commercial teams and operations teams to develop a service roadmap.
Develop proposals and business cases for change with rest of the CTO team.
Design and implement best practices and processes for delivering industry-leading service providing reliability and resiliency for mission-critical customer use cases.
Manage end-to-end management practices and processes that enable us to fail and recover fast and continuously learn and adjust.
Promote and evangelize the engineering discipline of Site Reliability within Leidos and customer(s).
Implement, advocate, and teach SRE/DevOps best practices in a DevOps collaborative context to customer and internal teams.
Assessment and metrics surrounding SRE initiatives and usage, re-incorporating that data to help teams with prioritization and planning.
Work with Leidos global teams to bring in practices to our existing account(s) in Australia.
Be recognized as a thought-leader in identifying and sharing best practices on SRE with internal and external teams.

As a fit for this role, you have:

Degree in Computer Science, Engineering or Science
A passion for solving the hard problems of running large-scale cloud services at the highest levels of reliability and resiliency.
A vision for applying software engineering skills and experiences to automate all aspects of the operation delivery and management process from build/test/deploy, monitoring and alerting, to automatic failover and capacity management.
Experience as a Site Reliability Engineer and a Site Reliability Engineering Manager in a cloud/SaaS-based environment where reliability and resiliency are critical factors in business continuity.
A proven track record of building SRE teams that have taken existing systems and improved them to the next level or two of reliability and resiliency.
Thorough experience and knowledge of industry-leading best practices, patterns, and toolsets for Site Reliability Engineering.
Automate deployments of services/applications into a hybrid cloud environment (Azure / Open Stack / AWS)
Familiarity and experience using industry-standard toolsets for provisioning and configuration management automation and operational monitoring.
Previous (preferably current) experience in one or more programming languages used in system automation and management.
Ability to make rational, data-driven decisions under pressure while maintaining a calm and confidence-inspiring demeanour.
Excellent project management and communication skills.
Provide leadership and direction to SRE that are responsible for break-fix, uptime and reliability for core services
Work closely with other teams and contractors on toil reduction efforts i.e. automation of repetitive dev and ops tasks
Strong experience in VMware blueprinting architecture would be an advantage.
Use modern tools to streamline configuration management e.g. Terraform, Packer, Ansible, Puppet, Chef, Dockerfiles

Qualifications

As a fit for this role, you have:

Degree in Computer Science, Engineering or Science
A passion for solving the hard problems of running large-scale cloud services at the highest levels of reliability and resiliency.
A vision for applying software engineering skills and experiences to automate all aspects of the operation delivery and management process from build/test/deploy, monitoring and alerting, to automatic failover and capacity management.
Experience as a Site Reliability Engineer and a Site Reliability Engineering Manager in a cloud/SaaS-based environment where reliability and resiliency are critical factors in business continuity.
A proven track record of building SRE teams that have taken existing systems and improved them to the next level or two of reliability and resiliency.
Thorough experience and knowledge of industry-leading best practices, patterns, and toolsets for Site Reliability Engineering.
Automate deployments of services/applications into a hybrid cloud environment (Azure / Open Stack / AWS)
Familiarity and experience using industry-standard toolsets for provisioning and configuration management automation and operational monitoring.
Previous (preferably current) experience in one or more programming languages used in system automation and management.
Ability to make rational, data-driven decisions under pressure while maintaining a calm and confidence-inspiring demeanour.
Excellent project management and communication skills.
Provide leadership and direction to SRE that are responsible for break-fix, uptime and reliability for core services
Work closely with other teams and contractors on toil reduction efforts i.e. automation of repetitive dev and ops tasks
Strong experience in VMware blueprinting architecture would be an advantage.
Use modern tools to streamline configuration management e.g. Terraform, Packer, Ansible, Puppet, Chef, Dockerfiles

Additional Information

This role does require the successful applicant to be an Australian Citizen and to be able to obtain a security clearance to Baseline level.

Applications close, 11pm, 3rd June 2019