Big Data Engineer, Spark/Scala

  • Full-time

Company Description

IQVIA™ is The Human Data Science Company™, focused on using data and science to help healthcare clients find better solutions for their patients. Formed through the merger of IMS Health and Quintiles, IQVIA offers a broad range of solutions that harness advances in healthcare information, technology, analytics and human ingenuity to drive healthcare forward.

The Business Unit: Real-World & Analytics Solutions (RWAS) Technology


“Big Data” is changing the way that the healthcare world operates and, now more than ever, the key to better patient outcomes is through better use of technology, seamlessly integrated information and analytics. Our Predictive Analytics team within the Real-World & Analytics Solutions (RWAS) Technology division is a fast-growing group of collaborative, enthusiastic, and entrepreneurial individuals. In our quest to harness the value of Real World Evidence (RWE), we are at the centre of IQVIA’s pursuit of machine learning and cutting-edge statistical methods to advance healthcare. Our efforts improve retrospective clinical studies, under-diagnosis of rare diseases, personalized treatment response profiles, disease progression predictions, and clinical decision-support tools.
We know that meaningful results require not only the right approach but also the right people. We invite you to reimagine healthcare with us. You will have the opportunity to play an important part in helping our clients drive healthcare forward and ultimately improve human health outcomes.

Job Description

As a Big Data Engineer he/she will be part of team of highly talented Engineers and Data Scientists,and the main focus being to write highly performant and scalable code that will run on top of our Big Data platform (Spark/Hive/Impala/Hadoop). As such, he/she will work closely with the Data Science team to support them in the ETL process (including the cohorts building efforts).


A typical day might include:

  • Working in a cross-functional team – alongside talented Engineers and Data Scientists
  • Building scalable and high-performant code
  • Mentoring less experienced colleagues within the team
  • Implementing ETL and Feature Extractions pipelines
  • Monitoring cluster (Spark/Hadoop) performance
  • Working in an Agile Environment
  • Refactoring and moving our current libraries and scripts to Scala/Java
  • Enforcing coding standards and best practices
  • Working in a geographically dispersed team
  • Working in an environment with a significant number of unknowns – both technically and functionally.

Qualifications

Experience:

  • BSc or MSc in Computer Science or related field
  • Strong analytical and problem-solving skills with personal interest in subjects such as math/statistics, machine learning and AI
  • Solid knowledge of data structures and algorithms
  • Experience working in an Agile environment using Test Driven Development (TDD) and Continuous Integration (CI)
  • Experience refactoring code with scale and production in mind.
Tech Skills
  • Proficient in Scala, Java and SQL
  • Strong experience with Apache Spark, Hive/Impala and HDFS
  • Familiar with Python, Unix/Linux, Git, Jenkins, JUnit and ScalaTest
  • Experience with integration of data from multiple data sources
  • Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
  • Experience with any of the following distributions of Hadoop - Cloudera/MapR/Hortonworks.

Additional Information

  • Other functional Languages such as Haskell and Clojure
  • Big Data ML toolkits such as Mahout, SparkML and H2O
  • Apache Kafka, Apache Ignite and Druid
  • Container technologies such as Docker
  • Cloud Platforms technologies such as DCOS/Marathon/Apache Mesos, Kubernetes and Apache Brooklyn.