Associate Principal Engineer - Cloud and Infrastructure Operations Lead
- Full-time
- Service Region: Others
Company Description
We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (19000+ experts across 33 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!
Job Description
Leadership & Team Management:
- Lead and manage the Cloud & Hosting operations team, providing guidance, mentoring, and support.
- Ensure team members are well-equipped and trained to handle the evolving cloud environment.
- Foster a culture of continuous improvement, accountability, and teamwork.
Operational Oversight:
- Oversee the daily operations of cloud hosting environments, ensuring high availability, reliability, and performance.
- Manage and monitor cloud infrastructure, troubleshooting incidents, and leading root cause analysis for any downtime.
- Ensure service-level agreements (SLAs) are met and reported to key stakeholders.
Cloud Infrastructure Management:
- Oversee the deployment, configuration, and maintenance of cloud environments (e.g., AWS, Azure, GCP) to support SaaS platforms and hosting services.
- Implement and maintain automation processes to ensure smooth and efficient operations.
- Collaborate with engineering and IT teams to optimize system performance, scalability, and security.
Security & Compliance:
- Ensure that all cloud and hosting solutions are compliant with industry standards and regulatory requirements.
- Manage security risks and ensure proper encryption, backups, disaster recovery plans, and vulnerability management.
Cost Management & Optimization:
- Monitor cloud usage and expenses, ensuring cost-effective use of resources.
- Implement cloud cost-optimization strategies to reduce operational costs while maintaining performance.
Stakeholder Communication:
- Act as the primary point of contact for the operations team, collaborating with software development, product, and client services teams.
- Communicate operational performance, incident status, and improvement plans to upper management and stakeholders.
Process Improvement:
- Develop and refine operational processes, aiming for efficiency, scalability, and reliability.
- Lead continuous improvement initiatives to enhance system performance, reduce downtime, and streamline operations.
Qualifications
Must have Skills: Cloud Platforms, DevOps Principles, Datacenter Operation, Virtualization Technologies, ITIL Practices.