Senior Site Reliability Engineer
- Full-time
Company Description
What started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company empowering manufacturers, distributors, and wholesalers to thrive in complex B2B commerce.
Our mission is simple: help businesses build stronger relationships through seamless digital commerce.
At Sana Commerce, you’ll join a team that’s bold, growth-oriented, and customer-obsessed, where every engineer has real ownership and impact.
About the role
We’re looking for a Senior Site Reliability Engineer (SRE) to enhance the reliability, performance, and scalability of our global e-commerce platform.
You’ll play a critical role in building resilient systems, automating infrastructure, and driving observability across Azure and Kubernetes environments.
This is a hands-on engineering role where you’ll blend reliability strategy with real-world execution, keeping today’s systems healthy while shaping the ones we’ll depend on tomorrow.
Job Description
What you’ll do
- Lead incident response and postmortems, drive investigations, document learnings, and implement permanent fixes to prevent recurrence.
- Manage and optimize Azure Kubernetes environments, own cluster configurations, performance, cost control, and security best practices.
- Build observability systems, develop dashboards, alerts, and metrics using Dynatrace, Honeycomb, ElasticSearch, Grafana/Kibana, and Azure Monitor (KQL).
- Automate for resilience, write reliable scripts in PowerShell, Bash, Python, or C#, embedding logging, rollback, and version control.
- Implement Infrastructure-as-Code, design and maintain Terraform, Bicep, or ARM templates to standardize and automate deployments.
- Optimize system performance, identify bottlenecks through deep monitoring, dump analysis, and right-sizing of cloud resources.
- Collaborate across engineering teams, integrate reliability principles into CI/CD pipelines and the broader SDLC.
- Participate in on-call rotations, lead during critical incidents, ensuring lasting fixes and operational excellence.
Qualifications
What you’ll bring
- 5+ years in SRE, DevOps, or Cloud Infrastructure roles with experience in large-scale, distributed systems.
- Strong Azure and Kubernetes expertise (production-level).
- Proven ability in observability engineering using Dynatrace, Honeycomb, Elastic, Grafana/Kibana, or Azure Monitor.
- Skilled in PowerShell, Bash, Python, or C#, with an automation-first mindset.
- Proficient in Infrastructure-as-Code (Terraform, Bicep, ARM).
- Solid grasp of TCP/IP, networking fundamentals, and performance tuning.
- Strong communicator able to translate complex technical findings into clear, actionable insights.
- Certifications preferred:
- Microsoft Certified: Azure Administrator Associate
- Certified Kubernetes Administrator (CKA)
Why you’ll love working here
- Impact from day one – Join a scale-up where your ideas shape how global businesses operate online.
- Continuous learning – Access a structured onboarding rated 9.1/10 by previous hires, mentorship, and feedback culture.
- Hybrid flexibility – Work from our Alexandria office 3 days per week and from home 2 days.
- Career growth – Expand your technical and leadership scope in a company built for long-term success.
Our values
At Sana Commerce, our values drive everything we do:
- Champions of Our League – We deliver lasting success, balancing quick wins and long-term value
- Supercharge Our Customers – We’re revolutionizing B2B commerce together, helping our customers to lead and succeed.
- Determined to Grow – We embrace challenges, growing and raising the bar for ourselves and our industry.
- Bold Together – We dare to be bold because we have each other’s back.
Ready to build reliability that scales?
Apply now and help shape the foundation of our next-generation SaaS platform.
Additional Information
#LI-Hybrid