Computer Science Intern (Introduce chaos engineering)

Intern

Company Description

Join a Fintech Pioneer ! Digital technology is deeply rooted in our DNA. Since our Company’s creation in 1996 by two EPFL engineers, we have used it to democratise banking and allow our clients to make independent decisions on our web platforms. We rely on more than 250 engineers (software engineers, front-end developers, software architects, DevOps engineers and data architects) to improve our solutions. Cryptocurrencies, Artificial Intelligence, Robo-advisory, Virtual Reality... There is no limit to what we can do to revolutionise the financial world !

We are currently looking for our new Computer Science Trainee (6 months) who is enrolled in a Master in Computer Science/Communication Systems and looking for a Master Project.

Job Description

The project’s goal is to research and create a prototype in the domain of chaos engineering using our Kubernetes clusters, our simulator testbench and Netflix’s chaos monkey suite.

Performance and continuity of service is key of success at Swissquote. We have more the 1400 applications running simultaneously, but we are unable to predict the behavior of our system in case of specific failures. Advances in large-scale, distributed software systems are changing the game. We are quick to adopt practices that increase flexibility of development and velocity of deployment. An urgent question follows on the heels of these benefits: how much confidence we can have in the complex systems that we put into production?

To specifically address the uncertainty of distributed systems at scale, Chaos Engineering can be thought of as the facilitation of experiments to uncover systemic weaknesses.

These experiments follow four steps:

Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior
Hypothesize that this steady state will continue in both the control group and the experimental group
Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed
Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group

If a weakness is uncovered, we now have a target for improvement before that behavior manifests in the system at large.

Your role:

Define, implement and deploy a system that will allow to test the resilience of a system against infrastructure, network and application failures
Integrate this new system to our current continuous deployment tool (Jenkins) and procedure

Qualifications

Currently enrolled in a Master in Computer Science/Communication Systems
Very good knowledge of Java

Additional Information

SQ1

Computer Science Intern (Introduce chaos engineering)

Company Description

Job Description

Qualifications

Additional Information

Job Location