Big Data Engineer, Spark/Scala

Full-time

Company Description

IQVIA™ is The Human Data Science Company™, focused on using data and science to help healthcare clients find better solutions for their patients. Formed through the merger of IMS Health and Quintiles, IQVIA offers a broad range of solutions that harness advances in healthcare information, technology, analytics and human ingenuity to drive healthcare forward.

The Business Unit: Real-World & Analytics Solutions (RWAS) Technology

“Big Data” is changing the way that the healthcare world operates and, now more than ever, the key to better patient outcomes is through better use of technology, seamlessly integrated information and analytics. Our Predictive Analytics team within the Real-World & Analytics Solutions (RWAS) Technology division is a fast-growing group of collaborative, enthusiastic, and entrepreneurial individuals. In our quest to harness the value of Real World Evidence (RWE), we are at the centre of IQVIA’s pursuit of machine learning and cutting-edge statistical methods to advance healthcare. Our efforts improve retrospective clinical studies, under-diagnosis of rare diseases, personalized treatment response profiles, disease progression predictions, and clinical decision-support tools.
We know that meaningful results require not only the right approach but also the right people. We invite you to reimagine healthcare with us. You will have the opportunity to play an important part in helping our clients drive healthcare forward and ultimately improve human health outcomes.

Job Description

As a Big Data Engineer he/she will be part of team of highly talented Engineers and Data Scientists,and the main focus being to write highly performant and scalable code that will run on top of our Big Data platform (Spark/Hive/Impala/Hadoop). As such, he/she will work closely with the Data Science team to support them in the ETL process (including the cohorts building efforts).

A typical day might include:

Working in a cross-functional team – alongside talented Engineers and Data Scientists
Building scalable and high-performant code
Mentoring less experienced colleagues within the team
Implementing ETL and Feature Extractions pipelines
Monitoring cluster (Spark/Hadoop) performance
Working in an Agile Environment
Refactoring and moving our current libraries and scripts to Scala/Java
Enforcing coding standards and best practices
Working in a geographically dispersed team
Working in an environment with a significant number of unknowns – both technically and functionally.

Qualifications

Experience:

BSc or MSc in Computer Science or related field
Strong analytical and problem-solving skills with personal interest in subjects such as math/statistics, machine learning and AI
Solid knowledge of data structures and algorithms
Experience working in an Agile environment using Test Driven Development (TDD) and Continuous Integration (CI)
Experience refactoring code with scale and production in mind.

Tech Skills

Proficient in Scala, Java and SQL
Strong experience with Apache Spark, Hive/Impala and HDFS
Familiar with Python, Unix/Linux, Git, Jenkins, JUnit and ScalaTest
Experience with integration of data from multiple data sources
Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
Experience with any of the following distributions of Hadoop - Cloudera/MapR/Hortonworks.

Additional Information

Other functional Languages such as Haskell and Clojure
Big Data ML toolkits such as Mahout, SparkML and H2O
Apache Kafka, Apache Ignite and Druid
Container technologies such as Docker
Cloud Platforms technologies such as DCOS/Marathon/Apache Mesos, Kubernetes and Apache Brooklyn.