Bigdata Engineer

  • Contract

Job Description

Position Details:

Job Title: Bigdata Engineer

Location: Tampa, FL

Duration: 12+ Months Contract to hire

 

Job Responsibilities:

Principal Responsibilities

·        Design interfaces to the data warehouses/data storages and machine learning/Big Data

·        applications using open source tools such as Scala, Java, Python, Perl and shell scripting.

·        Design and create data pipelines to maintain stable dataflow to the machine learning models –

·        both in batch mode and near real-time mode.

·        Interface with Engineering/Operations/System Admin/Data Scientist teams to ensure data

·        pipelines and processes fit within the production framework.

·        Ensure that tools and environments adhere to strict security protocols.

·        Deploy the machine learning model and serve its outputs as RESTful API calls.

·        Understand the business needs in close collaborations with subject matter experts (SMEs)

·        and Data Scientists to do efficient feature engineering for machine learning models.

·        Maintain the code and libraries in code repository.

·        Work with system administration team to proactively resolve issues/install tools and libraries

·        on the AWS platform.

·        Research and come up with architecture and solutions most appropriate for problems at hand.

·        Maintain and improve tools to assist Analytics in ETL, retrospective testing, efficiency,

·        repeatability, and R&D.

·        Lead by example regarding software best practices, including code style and architecture,

·        documentation, source control, and testing.

·        Support the Chief Data Scientist/Data Scientists/Big Data Engineers in creating new and novel

·        approaches to solve challenging problems using Machine Learning, Big Data and Cloud

·        technologies.

·        Handle ADHOC requirements to create reports for the end users.

 

Required Skills

·        Strong skills with Apache Spark (Spark SQL) and SCALA with at least 2+ years of experience.

·        Understanding of AWS Big Data components and tools.

·        Strong Java skills with experience in web services and web development is required.

·        Hands on experience with model deployment.

·        Hands on experience in application deployment on Docker and/or Kubernetes or other similar technology.

·        Linux scripting is a plus.

·        Fundamental understanding of AWS cloud components.

·        2+ years of experience in data ingesting, cleansing/processing, storing and querying large datasets

·        2+ years of experience in engineering large-scale data solutions with Java/Tomcat/ SQL/Linux

·        Experience working in a data intensive role including the extraction of data (db/web/api/etc.), transformation and loading (ETL)

·        Exposure with structured and/or unstructured data contents

·        Experience with data cleansing/preparation on Hadoop/Apache Spark Ecosystem – MapReduce/Hive/HBase/Spark SQL

·        Experience with distributed streaming tools like Apache KAFKA.

·        Experience with multiple file formats (Parquet, Avro, OCR)

·        Knowledge in AGILE development cycle.

·        Efficient coding skills to enhance the performance/cost savings of the job running on AWS platform.

·        Experience in building stable, scalable, and high-speed live streams of data and serving web platforms

·        Enthusiastic self-starter with ability to work in a team environment.

·        Graduate (MS) or Undergraduate degree in Computer Science/ Engineering/relevant field

 

Nice to have:

·        Strong Software development experience

·        Machine Learning model deployment experience

·        Ability to write custom Map/Reduce programs to clean/prepare complex data

·        Familiarity with Streaming data processing - Experience with distributed real time computation system like Apache STORM/Apache Spark Streaming.

Additional Information

All your information will be kept confidential according to EEO guidelines.