Data Wrangler

  • London, UK
  • Employees can work remotely
  • Full-time
  • Department: Data
  • Office: London

Company Description

Genomics England successfully led the world-leading 100,000 Genomes Project, which compared and analysed individuals’ genetic codes to help diagnose, treat and prevent illness.

We're now accelerating our impact, working with the NHS to further develop and embed genomic healthcare and research in Britain.  Our next chapter involves working with patients, doctors, scientists, government and industry to improve genomic testing, and help researchers access the health data and technology they need to make new medical discoveries and create more effective, targeted medicines for everybody.

Job Description

Responsible for the automation of data transformation (i.e. ETL and pipeline architecture) in line with strong engineering practices, version control, and high-quality standards. Focusing on performance and movement of large data volumes using appropriate tooling. 

Also responsible for the generation of key statistics and derivations to extract information from data and create new derived datasets to support researchers, removing repeat activity and increasing quality. Understanding the core data catalogues to help others to derive meaning and insight. 

Everyday responsibilities include:

Accountabilities for whole Data team 

  • Know and understand the meaning behind our virtues of Empathy, Integrity, Focus, Connection, Speed, Curiosity, Impact and embody them in all aspects of your role. 

  • Work closely with the Genomics England Science, Informatics, Bioinformatics and Service Delivery teams.  Provide the multi-disciplinary team with expert guidance on data management considerations during protocol development, data model definition and development of the final analysis plans 

  • Understanding of Agile working practices and delivery of large-scale complex data base solutions within an agile/sprint based methodology, delivering demonstrable value to customers within short timeframes 

  • Ability to learn and iterate from the above 

  • Data cleansing, linking and de-identification routines to ensure all eligible data collected are delivered to the research environment each quarter  

  • Support the clinical roles, (alongside the data team), delivering a coherent, accurate, sustainable data resource that supports: 

  • An efficient high quality, high volume pipeline 

  • An integrated, extensible, and coherent data centre for research 

  • The transformation of healthcare IT infrastructures for embedding genomic medicine into clinical care 

  • Develop a suite of metrics and analysis to support the Genomics England programme of work – to monitor data flows, quality and enable performance management 

  • Ensuring all code is optimised so data is readily accessible by all business users 

  • Designing, writing and deploying effective business intelligence solutions 

  • Providing data driven insights through analytics 

  • Master data management development and delivery


Accountabilities for Data Wrangler

  • Understanding and articulating business database requirements  

  • Data cleansing, linking and de-identification routines to ensure all eligible data collected are delivered to the research environment each quarter  

  • Generate statistics and derivations to extract information from data to support researchers, removing repeat activity and increasing quality 

  • The development of test datasets to enable automated regression testing of all data flows 

  • Efficient and timely delivery of: 

  • Routine delivery of comprehensive coherent clinical and secondary data resources to facilitate research  

  • Delivery of monthly data flows from secondary data sources 

  • Supporting the delivery of the Genomic Medicine Service 

  • Writing packages for ETL purposes Monitoring overall database performance and managing on-going development requests; designing and implementing changes to database structure to meet business requirements 

  • An understanding of Software Development Lifecycles and principles of code development and best practices 


Skills and Experience for Success

We anticipate the idea candidate will have:

Essential skills for whole Data team 

  • Experienced in synthesizing large amounts of complex datasets, ideally health-related, in a research, clinical or government setting, into meaningful conclusions and present recommendations to a vast array of individuals 

  • Familiarity with a programmatic environment for data manipulation, such as R, Python, and SQL to perform common data manipulation tasks including merging, appending, reshaping, identifying duplicates and recoding values 

  • Familiarity with a Cloud based OS systems like AWS, ETL solutions like Trifacta and visualisation software like Tableau. 

  • Ability to programmatically identify missing, inconsistent or out of range values 

  • Ability to produce summary visualisations for reporting purposes 

  • Comfortable performing in a fast paced, dynamic and ambiguous environment and proven ability to lead multiple high priorities with aggressive timelines 

  • Comfortable working in a collaborative and pragmatic way as part of a dedicated and closely communicating team. Used to welcoming feedback from end users on the products supported and keen to refine and improve quality and usability 

  • Able to communicate technical information clearly, able to translate between diverse groups of technical and non-technical individuals 

  • Excellent organisational skills with a commitment to reproducible work, documentation, and record-keeping 

  • Commitment to responsible and respectful handling of personal identifiable data 

  • Able to successfully manage competing priorities and demanding timelines across the workplace   

  • Experience of data models for example: developing XML schema definitions, JSON, HL7 FHIR 


Essential skills for Data Wrangler

  • Experience of data modelling and developing data sets and the necessary components for data transfers into those models 

  • Experience developing data extract, transform and load programmes to create a coherent high-quality comprehensive curated data resource 

  • Experience with developing scheduled data flows using an integration engine, including message management and troubleshooting, complex data integration programme sequences of multiple updates to a single data resource  

  • Familiar with software version control systems 

  • Experience of developing a single system with a team of developers, with delivery of development through application and system testing to production 

  • Solid applied knowledge of the principles of relational databases, database management and administration, and software particularly Pentaho, PostgreSQL, R, Shiny, Python, Java, Tableau 


Degree educated or equivalent


    Additional Information

    Originally conceived as a project, Genomics England has transformed to meet the long-term opportunities created by our scientific breakthroughs in understanding the Human Genome. Being part of this journey is a reward in itself, however we're pleased to offer our colleagues a great benefits package including:

    • competitive salary
    • 30 days holiday
    • generous pension scheme
    • individual learning budgets for every colleague
    • a raft of other benefits

    Talk to our Talent Team and find out how a career with Genomics England will benefit you.

    Privacy Policy