Staff DevOps Engineer

  • Full-time

Company Description

About A Place for Mom:

A Place for Mom is the leading online resource connecting families searching for senior care with a team of expert advisors providing insight-driven, personalized solutions. As the nation’s largest senior care advisory service, A Place for Mom helps hundreds of thousands of families every year navigate the complexities of finding the right senior care solution for their loved ones across home care, independent living, memory care, assisted living, and more. Established in 2000 as a family business, A Place for Mom employees are deeply committed to the company mission to enable caregivers to make the best senior care decisions. A Place for Mom fosters, cultivates, and preserves a culture of diversity, equity, and inclusion.

Our employees live the company values every day:

  • Mission Over Me: We find purpose in helping caregivers and their senior loved ones while approaching our work with empathy.
  • Do Hard Things: We are energized by solving challenging problems and see it as an opportunity to grow.
  • Drive Outcomes as a Team: We each own the outcome but can only achieve it as a team.
  • Win The Right Way: We see organizational integrity as the foundation for how we operate.
  • Embrace Change: We innovate and constantly evolve.

Job Description

We are seeking a highly skilled and experienced Staff DevOps Engineer to join our team. This role will focus on Site Reliability Engineering (SRE), enhancing our developer platform, and ensuring robust security practices. The ideal candidate will have a strong background in SRE principles, platform engineering, and security, with a proven ability to drive improvements in system reliability, performance, and security.

Key Responsibilities:

  • Site Reliability Engineering (SRE):
    • Implement and manage SRE practices to ensure high availability, reliability, and performance of our systems and services.
    • Develop and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
    • Monitor and analyze system performance, identifying and addressing reliability issues proactively.
    • Automate operational tasks to reduce manual intervention and improve system efficiency.
  • Developer Platform:
    • Enhance and maintain the developer platform to support efficient and scalable software development.
    • Collaborate with engineering teams to improve CI/CD pipelines, streamline development workflows, and optimize deployment processes.
    • Ensure that development tools and environments are up-to-date, reliable, and scalable.
  • Security:
    • Implement and manage security practices to protect our systems and data from threats.
    • Conduct regular security assessments and vulnerability scans to identify and mitigate risks.
    • Collaborate with security teams to enforce security policies and ensure compliance by default.
  • Collaboration and Leadership:
    • Lead with a product mindset, building tools that developers find intuitive and easy to use.
    • Work closely with cross-functional teams to support and improve system reliability, performance, and security.
    • Mentor and provide technical guidance to junior team members.
    • Stay updated with industry trends and best practices, applying them to improve our systems and processes.

Qualifications

  • Technical Skills:
    • Proven experience with SRE principles and practices.
    • Strong proficiency in DevOps tools and technologies, including CI/CD pipelines (GitHub Actions), containerization (Docker, Kubernetes), and infrastructure as code (Terraform).
    • Experience with monitoring and logging tools across application (DataDog, New Relic), infrastructure (Grafana, Prometheus), management (Splunk, ELK), and cloud (AWS CloudWatch)
    • Strong understanding of security practices and tools, including vulnerability scanning and threat detection.
  • Experience:
    • 5+ years of experience in DevOps, SRE, or a related field.
    • Demonstrated experience in managing and optimizing large-scale systems and platforms.
    • Experience with cloud platforms (e.g., AWS, Azure, GCP) and their security features.
  • Soft Skills:
    • Excellent problem-solving and analytical skills.
    • Strong communication and collaboration abilities.
    • Ability to work independently and as part of a team.
  • Education:
    • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.

Additional Information

Compensation

  • Base Salary: $140,000 to $150,000 + 20% Bonus
  • Benefits:
    • 401(k) plus match
    • Dental insurance
    • Health insurance
    • Vision Insurance
    • Paid Time Off

All your information will be kept confidential according to EEO guidelines.

#LI-KT1

#LI-REMOTE

Privacy Policy