Staff Engineer - Database Reliability Engineering

  • Full-time
  • Compensation: INR 0 - INR 0 - yearly

Company Description

Organizations everywhere struggle under the crushing costs and complexities of “solutions” that promise to simplify their lives. To create a better experience for their customers and employees. To help them grow. Software is a choice that can make or break a business. Create better or worse experiences. Propel or throttle growth. Business software has become a blocker instead of ways to get work done.

There’s another option. Freshworks. With a fresh vision for how the world works.

Freshworks Inc. builds uncomplicated service software that delivers exceptional employee and customer experiences. Our people-first approach to AI eliminates friction, helping businesses reduce complexity, lower cost-to-serve, and deliver faster, more human support through enterprise-grade yet easy-to-use CX and IT solutions. Nearly 75,000 companies, including Bridgestone, New Balance, Nucor, S&P Global, and Sony Music, trust Freshworks to power their Employee Experience (EX) and Customer Experience (CX) operations.

Fresh vision. Real impact. Come build it with us.

Job Description

    oles & Responsibilities

    End-to-End Reliability & Operations

    • Take full ownership of availability, latency, scalability, and durability across all services and databases.
    • Define and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical systems.
    • Lead incident response protocols, conduct blameless Root Cause Analyses (RCAs), and drive systemic fixes to improve MTTR and MTTD.
    • Build production readiness frameworks and establish best practices for capacity planning, deployments, rollbacks, and change management.

    Database Reliability & Architecture

    • Ensure the end-to-end reliability of relational databases, NoSQL databases, caching layers, and streaming platforms.
    • Design highly available, multi-region architectures, implementing robust cross-region replication and failover mechanisms.
    • Formulate and implement comprehensive backup, restore, and disaster recovery (DR) strategies.
    • Lead system design reviews with a strict focus on fault tolerance, scalability bottlenecks, data partitioning, and sharding.

    Platform Automation & Tooling

    • Build and evolve internal platforms for database provisioning, lifecycle management, and service deployment.
    • Champion Infrastructure as Code (IaC) and GitOps practices to reduce operational toil through automation and self-healing systems.
    • Define golden signals (latency, traffic, errors, saturation) and build comprehensive observability and tooling across the application, infrastructure, and database layers.
    • Develop reusable frameworks for failover automation, chaos testing, and reliability validation.

    Performance, Cost & Security

    • Optimize system performance and drive cost efficiency across cloud infrastructure (compute, network, storage) and database usage (IOPS, replication, backups).
    • Ensure systems comply with rigorous security and governance standards by implementing access controls, encryption (at rest and in transit), and audit logging.

    The Impact You Can Create

    As a Staff Engineer (IC4), you will act as a technical leader across the infrastructure, platform, and data layers. By blending Site Reliability Engineering (SRE) and Database Reliability Engineering (DBRE), you will:

    • Drive the organization-wide reliability strategy and solve highly ambiguous, high-impact engineering problems.
    • Influence system architecture across multiple teams, guiding product teams on resilient architecture patterns.
    • Raise the overall engineering standards through mentorship, design leadership, and by operating with high ownership and autonomy.

    Skills

    • Cloud & Architecture: Strong expertise in distributed systems, multi-region architectures, Disaster Recovery (DR), and cloud platforms (AWS preferred).
    • Databases & Streaming: Deep knowledge of Relational DBs (MySQL, PostgreSQL, Aurora), NoSQL (DynamoDB, Cassandra), Caching (Redis), and event-driven streaming systems (Kafka).
    • Programming: Proficiency in coding with Python, Go, or Java.
    • Systems & Observability: Strong understanding of Linux internals, networking, and storage systems, alongside hands-on experience with observability stacks like Prometheus, Grafana, and Datadog.

    Qualifications

    Qualifications

    • Experience: 10+ years of professional experience in SRE, DBRE, Infrastructure, or Platform Engineering.
    • Technical Mastery: Proven hands-on experience managing high-scale production systems, reliability engineering, and complex incident management.
    • Bonus / Preferred: Previous experience building Database-as-a-Service (DBaaS) offerings or robust internal platform engineering systems is highly preferred.

    Success Measures

    Your impact in this role will be measured by the following outcomes:

    • Delivering a measurable improvement in overall system uptime and reliability.
    • Driving a demonstrable reduction in incident frequency and Mean Time To Recovery (MTTR).
    • Increasing system automation, resulting in significantly reduced operational toil.
    • Achieving improved database performance alongside measurable cost efficiency gains.
    • The successful execution and deployment of multi-region and Disaster Recovery (DR) initiatives.

    Additional Information

    At Freshworks, we have fostered an environment that enables everyone to find their true potential, purpose, and passion, welcoming colleagues of all backgrounds, genders, sexual orientations, religions, and ethnicities. We are committed to providing equal opportunity and believe that diversity in the workplace creates a more vibrant, richer environment that boosts the goals of our employees, communities, and business. Fresh vision. Real impact. Come build it with us.

    Privacy Notice