Staff Production Engineer

  • Full-time
  • Recruitment type: Permanent

Job Description

Join the team redefining how the world experiences design.

Hey, g'day, , kia ora, 你好, hallo, vítejte!

Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point.

Where and how you can work

Collingwood is home to our Melbourne campus - a vibrant, creative hub for connection and impactful work. While Sydney is home to our HQ, Melbourne brings its own unique vibe, with local artwork, lush greenery, and thoughtfully designed spaces to help you collaborate, focus, and feel part of a welcoming community.

This role is based in Melbourne, and we’re looking for someone who calls it home. Our hybrid way of working gives you the flexibility to work remotely, and to come together on campus for meaningful in-person collaboration and connection when it matters most.

What you'd be doing in this role

The Production Engineering team sits at the intersection of software engineering and the hardest reliability problems in Canva's infrastructure. Writing software that changes how production behaves at 240M MAUs and growing.

The strategic bet is a different model entirely. Canva's own take on what production reliability looks like, built for how we work. Senior software engineers embedded long-term in the areas that carry the most technical risk, working shoulder to shoulder with product teams, close enough to the roadmap to shape how it lands in production before the problems compound. Not operationalising systems. Not running alerts. Writing software that changes how production behaves.

The engineers who do this work well have gone deep in systems most people only operate. They can walk into a codebase they didn't write, understand what's actually happening at scale, win the technical respect of the team they're embedded with, and then improve the software to make it more reliable, more efficient, and more resilient.

At the moment, this role is focused on:

  • Owning an engagement area: Taking long-term accountability for one of Canva's highest-risk technical domains, sharding core data stores, resource utilisation, distributed systems challenges at scale embedded alongside the team that owns it.

  • Writing production software: The work is code, not process. Instrumenting, refactoring, rebuilding the pieces that cause problems at scale. You're a software engineer first; the reliability outcome at scale is what you're optimising for.

  • Collaboration: Opportunity to pair, mentor and learn from fellow production engineers

  • Customer First: Striving for fewer incidents, faster recovery, lower severity, latency that bends in the right direction. Taking pride in moving needle metrics, that positively impacts the quality of the customer experience.

  • Platform contributions: Where you see a pattern that needs a shared capability, you bring it back, not to own it indefinitely, but to seed the platform work that scales beyond your engagement.

  • Compounding at the system layer: One engineer who truly understands a system can change how every other engineer builds on top of it. That's the leverage in this role and why the archetype matters more than the domain.

  • What success looks like: As a secondee, developing trusted relationship with your team. Guiding them towards shipping at velocity, with more confidence and less toil.

You’re probably a match

We'd love to hear from you if you fit one or more of these. You don't need to meet all of them, but the more the better and if you join the team, we're invested in helping you grow.

Experience

  • Production at scale: Owned reliability work within large-scale distributed systems. When things broke, you wrote the fix, not the ticket.

  • Embedded delivery: Previously worked as an engineer embedded in or partnering closely with a product or feature team, not siloed in a platform org that throws tools over the fence.

  • JVM or systems depth: You've built real things in Java, Go, Rust, C++, or a comparable systems language at production scale; commercial depth, not academic familiarity. We're language-flexible for the right engineer, but you need to show up and win the technical duel in the first meeting

  • Distributed systems in practice: Navigated sharding, replication, failure modes, consistency tradeoffs in real systems.

  • Debugging large codebases: Ability to parachute into an unfamiliar codebase, orient quickly, find where the problem actually lives, and fix it.

  • Influence without authority: Proven to have made things better in systems through wisdom and trust.

Technical knowledge

  • Networking Depth: You know the network stack and what traffic looks like a scale.

  • Linux internals: Enough kernel-level understanding to reason about what's actually happening when a system misbehaves process scheduling, memory, I/O, network stack.

  • Distributed systems patterns: Consistent hashing, leader election, consensus, backpressure, circuit breakers.

  • Observability tooling: You've instrumented systems for real, built the tracing, the dashboards, the alerting that actually tells you what's wrong. You understand the difference between causal and symtom based alerting and know what a good SLO looks like

  • Containerisation and orchestration: Kubernetes at production scale, you understand what happens at the scheduler level.

  • Performance analysis: You've profiled JVM applications or systems-level processes, found the thing nobody was looking at, and fixed it in a way that lasted.

  • Cloud infrastructure: AWS at meaningful depth, so you understand how they behave under load and at the edges.

  • Incident response in practice: You've been on-call in a serious production environment and have opinions about what good incident management actually looks like.

Nice to have

  • Enterprise SaaS background: You've done this specific kind of work at an org that's done it well. You know what "production engineering" means when it's not just a job title.

  • JVM internals: You've tuned GC and profiled threads in production.

  • Multi-region or sharding experience: You've been involved in a data store migration or multi-region architecture where getting it wrong was not an option.

  • eBPF or kernel instrumentation: Production experience is valuable experience.

About the Group and Team

Join the Production Engineering Group at Canva, where our mission is to make every system that powers Canva fast, reliable, and ready for the next scale. Infra owns the infrastructure layer that every other team builds on: compute, storage, networking, developer experience, and reliability.

The Reliability Platform subgroup is where Canva thinks seriously about the technical risk that comes with operating at hundreds of millions of users. It's a group with broad scope, from the tooling that helps teams run incidents well, to the engineering work that stops incidents from happening in the first place.

Production Engineering sits within Reliability Platform. A small team of senior software engineers embedded in Canva's highest-risk technical areas, working alongside the product and infrastructure teams who own those systems for a long-term engagements. When it works, other teams ship more confidently and the incidents that do happen resolve faster and hurt less.

What’s in it for you?

Achieving our crazy big goals motivates us to work hard — and we do — but you'll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work.

Here's a taste of what's on offer:

  • Equity packages — we want our success to be yours too

  • Inclusive parental leave policy that supports all parents & carers

  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more

  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Other stuff to know?

We see AI as a powerful amplifier of creativity and technology at Canva. We're evolving how we assess AI skills in our Technology hiring experience — you'll tackle interactive, real-time challenges that reflect the kind of work we do. In some interviews, you may also be asked to solve a problem using an AI tool to show how you approach challenges with tech by your side.

We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture.

When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process. We celebrate all types of skills and backgrounds at Canva, so even if you don't feel like your skills quite match what's listed above — we still want to hear from you!

Please note that interviews are conducted virtually.

Privacy Notice