Production Engineer

Full-time

Company Description

At Snarkify, we are passionately driven by our mission to scale zero-knowledge proofs (ZKPs) for a trustless future. We empower developers by providing robust infrastructure and user-friendly tools, enabling them to effectively build, deploy, and scale ZKP applications. Our founders carry exceptional expertise gained from renowned organizations such as the Ethereum Foundation, Facebook, Amazon, and OKX. Through the application of folding schemes, proof aggregation, and GPU acceleration, we're pushing the scalability of proof systems to unprecedented levels. Through this unwavering commitment, we are shaping a future that promises enhanced security, privacy, and decentralization.

Job Description

Snarkify is looking for a highly skilled and motivated Production Engineer / Site Reliability Engineer / DevOps to join our team. In this role, you will be instrumental in ensuring the stability, scalability, and performance of our groundbreaking Zero-Knowledge Proof (ZKP) prover network. You will work closely with our engineering teams to build and maintain the infrastructure and tools needed to keep our decentralized systems running smoothly and securely.

Responsibilities:

Understand the architecture and deployment requirements of modern Layer 2 rollup stacks, and take charge of maintaining and enhancing our in-house zkRollup infrastructure to ensure optimal performance and reliability.
Set up and maintain highly available Kubernetes (K8s) clusters across multiple environments, ensuring scalability, resilience, and security for our prover network.
Develop, improve, and manage deployment pipelines for third-party Docker images, ensuring seamless integration and consistent deployment across different clusters.
Collaborate with external customers to understand their system architecture and deployment requirements, designing and implementing tailored plans for third-party prover deployments.
Design and implement robust monitoring, logging, and alerting systems to ensure the health and reliability of our decentralized network and hosted services, utilizing tools such as Prometheus, Grafana, and ELK Stack.
Manage cloud services and resources across platforms such as AWS, GCP, and private clusters, optimizing for performance, cost-efficiency, and security in a multi-cloud environment.
Build and maintain CI/CD pipelines to support continuous integration and delivery for a diverse codebase, including Rust, C++, python, and other technologies, enabling rapid and reliable software releases.

Qualifications

Skills and Qualifications:

Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
3+ years of experience in site reliability engineering, DevOps, or a similar role, preferably in leading internet company such as AWS, Google, Meta, etc.
Being open-minded and having prior experience in blockchain-related projects are preferred.
Expertise in containerization and orchestration technologies, including Docker and Kubernetes.
Strong experience with cloud platforms (AWS, GCP, Azure) and cloud-native tools and services (e.g., EC2, ECS, Lambda, S3, RDS).
Knowledge of monitoring and observability tools such as Prometheus, Grafana, and ELK Stack.
Proficiency with CI/CD tools such as GitHub Actions, CircleCI, or Jenkins.
Strong problem-solving skills, attention to detail, and a proactive approach to system reliability and scalability.
Excellent communication and teamwork skills, with the ability to work effectively in a remote, fast-paced, and dynamic environment.

Additional Information

Benefits

Competitive base salary with founding member equity.
The opportunity to build the next-generation ZK computing platform.
Immersion in a team of top-notch global blockchain engineers.
A flexible and innovative remote work environment.
Room for continuous growth and development in the ZK field.

I'm interested