Data Architect

Contract

Job Description

Designing, developing, and implementing data service architecture that ingests real time data streams, parses JSONs, and loads them in to multiple data persistence stores to provide real time and offline analytics.
Preparing and analyze large amount of data using Hadoop ecosystem tools and components.
Building fact tables to facilitate quicker and easier data access.
Building indices at Elastic Search to support real-time dash boards at Kibana, and building predictive models to support AI/ML.
Hands on programming and coding with Big Data technologies and tools like Apache Spark, Apache Hive, Apache Pig, Apache Sqoop, Apache Storm, Apache Kafka etc.
Perform Ingestion of structured, unstructured and semi structured data from multiple data sources in to Hadoop Distributed environment.
Responsible for design, development and delivery of data-sets from operational systems and files and ingestion into ODSs (operational data stores), Data Marts and files.
Troubleshoot and develop on Hadoop technologies including HDFS, Kafka, Hive, Pig, Flume, HBase, Spark, Impala and Hadoop ETL development via tools such as ODI for Big Data and APIs to extract data from source.
Developing and deploying distributed computing Big Data applications using Open Source frameworks like Apache Spark, Apex, Flink, Nifi, Storm and Kafka on AWS Cloud.
Translate, load and present disparate data-sets in multiple formats and from multiple sources including JSON, Avro, text files, Kafka queues, and log data. Data will range in type from structured through semi-structured to unstructured.
Perform programming in Java, Scala, Python and Open Source RDBMS and NoSQL databases and Cloud based data warehousing services such as Redshift.
Work on the Hadoop modules such as YARN & MapReduce, and related Apache projects such as Hive, Hbase, Pig, and Cassandra.
Responsible for building solutions involving large data sets using SQL methodologies, Data Integration Tools like ODI in any database.
Leverage DevOps techniques and practices like Continuous Integration, Continuous Deployment, Test Automation, Build Automation and Test Driven Development to enable the rapid delivery of working code utilizing tools like Jenkins, Maven, Nexus, Chef, Terraform, Ruby, Git and Docker.
Perform unit, component and integration testing of software components including the design, implementation, evaluation and execution of unit and assembly test scripts.
Manage and troubleshoot the CI/CD environment, responsible for the administration of the CI/CD tools, helping users set up workspaces, perform routine troubleshooting for users, and training users in CI/CD technology and processes and establishing some level of automation.
Manage access level controls, Jenkins continuous integration/continuous deployment (CI/CD) pipelines, CI/CD infrastructure, responsible for setting up CI/CD technology and processes.
Responsible for continuous integration / continuous delivery (CI/CD) tools and processes, including Bamboo, Jenkins, and GitHub.

Qualifications

Bachelor’s Degree or Master's Degree.
7 years of work experience in data warehousing and analytics.
5 years of ETL design, development and implementation experience.
Hnads on experinec in Scala, Python and Open Source RDBMS and NoSQL databases and Cloud based data warehousing services such as Redshift, Hadoop expeirnece in YARN & MapReduce, and related Apache projects such as Hive, Hbase, Pig, and Cassandra, HDFS, Kafka, Hive, Pig, Flume, HBase, Spark, Impala.
2+ years of development experience in Java, Python, Hadoop Stack, Cloud computing (AWS).
4+ years of UNIX/Linux experience and scripting.
Experience in Agile and Waterfall methodologies.
Experience with AWS architectures, development, and deployment. AWS Certified Solutions Architect Professional desired. AWS Certified Solutions Architect Associate required.
4+ years' experience with Relational Database Systems and SQL (PostgreSQL or Redshift)
Experience managing applications in AWS and familiarity with core services including EC2, S3, RDS, etc.
Experience using Scala on Spark, especially building ETL and complex query models.
Experience on Hadoop platform including Big Data tools. Experience in developing shell scripts and running cron/oozie/spark jobs on Hadoop platform.
Experience with Kafka, hBase, and Hive and experience on ElasticSearch and Kibana.
Experience in accessing and modeling NoSQL data models, especially with Cassandra.
Experience with APIs, JSON, OLTP and real-time data processing.
Working experience on Linux and Cloud platforms. Knowledge of Big Data Tools. Experience in building reports using Tableau.
Knowledge of business intelligence and analytics industry and best practices.
Experience using data wrangling, data engineering, and feature engineering software.
Experience in interpreting data models to build user friendly visualizations/dashboards.
Experience in statistical techniques and quantitative methodologies that are used in decision making applications.

Additional Information

All your information will be kept confidential according to EEO guidelines.