MLOps and DevOps Engineer (TE-CRG-GLO-2024-126-LD)

  • Contract

Company Description

At CERN, the European Organization for Nuclear Research, physicists and engineers are probing the fundamental structure of the universe. Using the world's largest and most complex scientific instruments, they study the basic constituents of matter - fundamental particles that are made to collide together at close to the speed of light. The process gives physicists clues about how particles interact, and provides insights into the fundamental laws of nature. Find out more on http://home.cern

Job Description

Introduction

Are you an MLOps and DevOps Engineer with a passion for deploying and maintaining robust, scalable, and secure distributed infrastructure? Would you like to contribute to the mission of engineers in improving the performances of accelerators and clinicians in the diagnosis, prevention, and management of diseases? Then join CERN and take part in supporting knowledge transfer to society of the federated learning platform used in the accelerator technology field.

Support to the UMBRELLA Project funded by a H2020 (the EU's research and innovation funding programme) grant. UMBRELLA is a holistic approach to progress, reshape, and benchmark the overall stroke care pathway and set new and improved standards of care in terms of primary and secondary prevention, rapid access to treatments, early accurate diagnosis, stratification, management and real-time monitoring, therapeutic targets identification, and rehabilitation, recurrent stroke and related cardiovascular events.

You will join:

  • the Data and Image Analysis team in the Technology Department, responsible for developing and implementing machine learning algorithms for particle physics accelerators infrastructure data analysis as well as centralized and decentralized federated learning and distributed computing to provide robust, efficient, and privacy-preserving models.
  • a H2020-funded European project that proposes a holistic approach to progressing, reshaping, and benchmarking the overall stroke care pathway and setting new and improved standards of care in terms of primary and secondary prevention, rapid access to treatments, early accurate diagnosis, stratification, management and real-time monitoring, therapeutic targets identification, rehabilitation, recurrent stroke, and related cardiovascular events.
  • an international collaboration to develop, deploy and operate a federated learning infrastructure hosted by CERN to enable multiple clinical and research infrastructure sites to build trustworthy AI based models collaboratively.

Functions

As an MLOps and DevOps Engineer, you will work within an international collaboration of experts in machine learning, accelerator technology, and the medical field.

You will operate, improve, and secure a large-scale, distributed federated learning infrastructure, ensuring high performance, availability, and security.

Your main activities will consist of:

  • Collecting stakeholder technical requirements to be translated into project and research activities.

  • Designing, implementing, and maintaining a federated learning distributed infrastructure. This includes remote deployment and management between decentralized sites, ensuring seamless operations, availability, performance, and security across all system components.
  • Designing, implementing, and maintaining MLOps infrastructure for efficient model tracking and serving under federated learning environments.
  • Ensuring the security of the federated learning infrastructure through SecOps.
  • Designing, implementing, and maintaining best practices, standards, and processes to ensure the quality, reliability, and maintainability of federated learning infrastructure.
  • Sharing domain and technical expertise, providing technical mentorship and training to peers and students.
  • Publishing and reporting results and achievements in scientific journals.

Qualifications

Master's degree or PhD or equivalent relevant experience in the field of Machine Learning, Computing Science, Engineering, or a related field.

Experience:

The successful candidate should have demonstrated experience in the following fields:

  • Deploying and maintaining MLOps and DevOps infrastructures and pipelines.
  • Configuration and operational support of microservice infrastructure to ensure high performance, availability, and security levels.
  • SecOps practices to maintain and enhance software, network, and model security, including experience with security tools (e.g., GitLab security, KubeScape, Semgrep).
  • Knowledge and experience with software life-cycle tools and procedures, including experience in container orchestration (e.g., Kubernetes, Docker, Podman) and with DevOps tools and Infrastructure as Code (IaC) (e.g., ArgoCD, Flux, Jenkins).
  • Proven experience in cloud platforms (e.g., AWS, Azure, Google Cloud, OpenStack).
  • Coding in one or more languages and knowledge of data structures, algorithms, and software design.

Additional experience in any of the following fields would be an asset:

  • Proficiency in federated learning techniques.
  • Working experience as full stack developer.
  • Knowledge of SQL, C and Java.
  • Integrating machine learning models with medical devices and healthcare systems, and understanding regulatory requirements (ISO 13485/14971, FDA guidelines, GDPR, EU AI Act, HIPPA).
  • Experience with network security, including certificates, encryption, access control, authorization, and authentication.
  • Proven experience leading and managing technical teams, preferably in a research or engineering environment.

Technical competencies:

  • Knowledge and application of software life-cycle tools and procedures: proficiency with Git and GitLab.
  • Administration of computing systems: upgrades, application of security patches, system and data migrations, backup and recovery.
  • Knowledge of programming techniques and languages: python programming, deep learning frameworks (e.g. Tensorflow, Keras, Pytorch, Jax).
  • Testing, diagnosing and optimization of software

Behavioural competencies:

  • Solving Problems:
    • Identifying, defining and assessing problems, taking action to address them;
  • Achieving Results:
    • Delivering high quality work on time and fulfilling expectations;
  • Working in Teams:
    • Building and maintaining constructive and effective work relationships;
  • Communicating Effectively:
    • Delivering presentations in a structured and clear way; adjusting style and content to the audience; responding calmly and confidently to questions;
    • Ensuring that information, procedures and decisions are appropriately documented;
  • Learning and Sharing Knowledge:
    • Keeping up-to-date with developments in own field of expertise and readily absorbing new information;
    • Sharing knowledge and expertise freely and willingly with others; coaching others to ensure knowledge transfer.

Language skills:

Spoken and written English: ability to draw-up technical specifications and/or scientific reports and to make oral presentations.

Ability to understand and speak, or willingness to learn and improve French.

Additional Information

Eligibility and closing date:

Diversity has been an integral part of CERN's mission since its foundation and is an established value of the Organization. Employing a diverse workforce is central to our success. We welcome applications from all Member States and Associate Member States.

This vacancy will be filled as soon as possible, and applications should normally reach us no later than 22.09.2024 at 23:45

Employment Conditions

Contract type: Limited duration contract (4 years). Subject to certain conditions, holders of limited-duration contracts may apply for an indefinite position.

These functions require:

  • Work during nights, Sundays and official holidays, when required by the needs of the Organization.
  • Stand-by duty, when required by the needs of the Organization.

Job grade: 6-7

Job reference: TE-CRG-GLO-2024-126-LD

Benchmark Job Title: Computing Engineer

Privacy Policy