Senior Infrastructure Engineer / Observability Lead
- Full-time
- Department: Others
- Job Category: IT Infrastructure
Company Description
NCS is a leading technology services firm that operates across the Asia Pacific region in over 20 cities, providing consulting, digital services, technology solutions, and more. We believe in harnessing the power of technology to achieve extraordinary things, creating lasting value and impact for our communities, partners, and people. Our diverse workforce of 15,000 has delivered large-scale, mission-critical, and multi-platform projects for governments and enterprises in Singapore and the APAC region.
Job Description
The Observability Lead is responsible to define, build, and scale our observability strategy across systems, applications, and infrastructure. This role will drive visibility into system health, performance, and reliability, enabling engineering teams to detect, diagnose, and resolve issues proactively.
The Observability Lead will lead the design and implementation of logging, metrics, tracing, and alerting frameworks, while fostering a culture of data-driven reliability and operational excellence. Strong technical, analytical, and communication skills are essential to ensure effective delivery of infrastructure services in a complex enterprise environment.
Responsibilities
- Define and own the observability roadmap and establish governance, standards, and best practices for logs, metrics, traces, and user experience monitoring
- Lead and mentor a team of engineers (if applicable)
- Design, implement and maintain scalable monitoring & observability platforms
- Evaluate and integrate tools for monitoring, alerting, and incident management
- Configure deployments across cloud, on-prem, and containerized environments
- Ensure high availability and performance of observability systems
- Leverage solution capabilities such as AI-powered anomaly detection, auto-discovery, and root cause analysis
- Implement full-stack monitoring including infrastructure, applications, logs, synthetic monitoring, and real user monitoring (RUM)
- Manage access control, data privacy, and system integration
- Design intelligent alerting strategies to reduce noise and improve signal quality
- Integrate solution tool with incident management tools (e.g., ServiceNow, JIRA)
- Improve incident detection, triage, and root cause analysis processes
- Use insights to drive performance optimization and capacity planning
- Support load testing and proactive performance engineering initiatives
- Build executive and engineering dashboards to provide actionable insights
- Promote adoption of observability practices across development teams
- Drive shift-left practices for monitoring and debugging
- Enable actionable insights through dashboards, analytics, and visualization
- Optimize signal-to-noise ratio in alerts and monitoring systems
- Ensure efficient data storage, retention, and cost management
- Enable self-service observability for teams
Qualifications
The ideal candidate should be / possess:
- 7+ years of experience in Monitoring, Observability, SRE, or DevOps roles
- Strong understanding of:
- Distributed systems and microservices architectures
- Monitoring concepts: metrics, logs, traces, events
- Cloud-native environments and container orchestration
- Preferred hands-on experience with Dynatrace or Grafana
- Experience with cloud platforms (AWS, Azure, or GCP)
- Familiarity with CI/CD pipelines and DevOps practices
- Strong scripting/programming skills (Python, Bash, or similar)
Preferred Attributes
- Strong troubleshooting, analytical, and problem-solving skills
- Excellent written and spoken English communication skills
- Strong teamwork and collaboration abilities across cross-functional teams
- Proactive and adaptable to new technologies and operational requirements
- Customer-oriented mindset with attention to service quality and delivery
- Experience with Kubernetes and container observability
Professional and/or Technical Certifications
- Observability tools certification (e.g. Dynatrace, Grafana)
Additional Information
We are driven by our AEIOU beliefs—Adventure, Excellence, Integrity, Ownership, and Unity—and we seek individuals who embody these values in both their professional and personal lives. We are committed to our Impact: Valuing our clients, Growing our people, and Creating our future.
Together, we make the extraordinary happen.
Learn more about us at ncs.co and visit our LinkedIn career site.
Scam Alert
We are aware of fraudulent job offers and impersonations of NCS recruiters. Phishing emails using convincing-looking but fake addresses are also commonly used to trick you into thinking that they come from official NCS sources.
Please note that all official communications from NCS Group will only be sent from verified corporate email addresses. Always check that the sender’s email address ends with the genuine NCS domain, @ncs.com.sg and beware of extra letters, symbols or misspellings. When in doubt, verify the sender’s identity by contacting us at [email protected].