Data Engineer

  • Full-time

Company Description

Founded in 2016, VoxelCloud is a Los Angeles-based leader worldwide in artificial intelligence (AI) analysis of medical images.  Backed by Sequoia and Tencent.  We help healthcare providers make better/earlier diagnoses and other clinical decisions.  http://www.voxelcloud.ai

Job Description

The R&D team at VoxelCloud (Westwood, Los Angeles, CA) manages and maintains large-scale medical and healthcare data at the core of all our R&D activities. Reporting to the CTO and Product Lead, Data Engineers participate in the acquisition and manipulation of massive datasets in multi-modal formats (medical images, consumer images, audios, text(EMR), etc.) on cloud storage. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up. The Data Engineer will support our software developers, data analysts and data scientists on product/research initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimizing or even re-designing our company’s data architecture to support our next generation of products and data initiatives.

Responsibilities:

  • Create and maintain optimal data pipeline to support machine learning research and development
  • Identify, design, and implement internal process improvements: automating data QA, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS/AliCloud big data technologies.
  • Build analytics tools that utilize the data pipeline to provide actionable insights into product utilization and operational efficiency.
  • Keep our data separated and secure across national boundaries both locally and on cloud storages.

Qualifications

 

  • Proficient with at least one object-oriented/object function scripting languages: Python, Java, C++, Scala, etc
  • Working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases (Postgres).
  • Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
  • Solid understanding of information retrieval, statistics and machine learning, experience with CNN/RNN, Computer Vision and NLP is a plus.
  • Prefer 1+ years in big data and related technology (e.g. DFS); experience with high-performance and scalable distributed system.
  • Prefer experience with AWS cloud services: EC2, EMR, RDS, Redshift
  • Skillful with automation tasks, but willing to get hands dirty for quality control.
  • Detail-oriented, well organized and self-motivated with a continuous drive to learn, explore and challenge; good communication skills and team player. 
  • Experience supporting and working with cross-functional teams in a dynamic environment.
  • MS, BA/BS degree in computer science, statistics or related field.

Additional Information

All your information will be kept confidential according to EEO guidelines.