Data Architect

  • Irving, TX, USA
  • Contract

Job Description

  •  Analyzes and transforms business requirements into automated solutions per development standards. Researches and resolves data issues identified by business users, IT operations, or partners. 
  • Designing, developing, and implementing data service architecture that ingests real time data streams, parses JSONs, and loads them in to multiple data persistence stores to provide real time and offline analytics.
  • Preparing and analyze large amount of data using Hadoop ecosystem tools and components.
  • Develop technical solutions which may involve a variety of data development tools, preferably Informatica IDQ.
  • Design, develop, test and support from SQL stored procedures, views, Informatica workflows, Control-M flows or script objects.
  • Configuring, Optimizing and Maintaining Data Governance applications using Informatica data quality (IDQ) including : IDQ data governance software configuration, integration, and testing with a Data Lake’s on a Hadoop environment. 
  • Building and testing data quality rules in IDQ that measure the quality of the data assets.
  • Tuning and optimizing the execution of data quality measurements to minimize operational impacts of production Data Lake.
  • Perform programming in Java/ Scala/Python and Open Source RDBMS and NoSQL databases and Cloud based data warehousing services such as Redshift.
  • Hands on programming and coding with Big Data technologies and tools like Apache Spark, Apache Hive, Apache Pig, Apache Sqoop, Apache Storm, Apache Kafka etc.
  • Perform Ingestion of structured, unstructured and semi structured data from multiple data sources in to Hadoop Distributed environment.
  • Responsible for design, development and delivery of data-sets from operational systems and files and ingestion into ODSs (operational data stores), Data Marts and files.
  • Troubleshoot and develop on Hadoop technologies including HDFS, Kafka, Hive, Pig, Flume, HBase, Spark, Impala and Hadoop ETL development via tools such as ODI for Big Data and APIs to extract data from source.
  • Developing and deploying distributed computing Big Data applications using Open Source frameworks like Apache Spark, Apex, Flink, Nifi, Storm and Kafka on AWS Cloud.
  • Translate, load and present disparate data-sets in multiple formats and from multiple sources including JSON, Avro, text files, Kafka queues, and log data. Data will range in type from structured through semi-structured to unstructured.
  • Work on the Hadoop modules such as YARN & MapReduce, and related Apache projects such as Hive, Hbase, Pig, and Cassandra.
  • Responsible for building solutions involving large data sets using SQL methodologies, Data Integration Tools like ODI in any database.
  • Leverage DevOps techniques and practices like Continuous Integration, Continuous Deployment, Test Automation, Build Automation and Test Driven Development to enable the rapid delivery of working code utilizing tools like Jenkins, Maven, Nexus, Chef, Terraform, Ruby, Git and Docker.
  • Perform unit, component and integration testing of software components including the design, implementation, evaluation and execution of unit and assembly test scripts.
  • Manage and troubleshoot the CI/CD environment, responsible for the administration of the CI/CD tools, helping users set up workspaces, perform routine troubleshooting for users, and training users in CI/CD technology and processes and establishing some level of automation.
  • Manage access level controls, Jenkins continuous integration/continuous deployment (CI/CD) pipelines, CI/CD infrastructure, responsible for setting up CI/CD technology and processes.
  • Responsible for continuous integration / continuous delivery (CI/CD) tools and processes, including Bamboo, Jenkins, and GitHub


  • Bachelor’s Degree or Master's Degree.
  • 7 years of work experience in data warehousing and analytics.
  • 5 years of ETL design, development and implementation experience.
  • Minimum 6 years designing, developing, and maintaining Informatica Applications with 6 years combined in general application development and exposure to Informatica Administration.
  • 4+ years' experience with Relational Database Systems and SQL (PostgreSQL or Redshift).
  • Development experience must be full Life Cycle experience including business requirements gathering, data sourcing, testing/data reconciliation, deployment and data automation practices.
  • Hands on experience in Scala, Python and Open Source RDBMS and NoSQL databases and Cloud based data warehousing services such as Redshift, Hadoop experience in YARN & MapReduce, and related Apache projects such as Hive, Hbase, Pig, and Cassandra, HDFS, Kafka, Hive, Pig, Flume, HBase, Spark, Impala.
  • Experience with AWS architectures, development, and deployment. AWS Certified Solutions Architect Professional desired. AWS Certified Solutions Architect Associate required.
  • 2+ years of development experience in Java, Python,  Hadoop Stack, Cloud computing (AWS).
  • Experience managing applications in AWS and familiarity with core services including EC2, S3, RDS, etc.
  • Experience using Scala on Spark, especially building ETL and complex query models.
  • Experience on Hadoop platform including Big Data tools. Experience in developing shell scripts and running cron/oozie/spark jobs on Hadoop platform. 
  • Experience with Kafka, hBase, and Hive and experience on ElasticSearch and Kibana.
  • Experience with APIs, JSON, OLTP and real-time data processing. 
  • Working experience on Linux and Cloud platforms. Knowledge of Big Data Tools. Experience in building reports using Tableau. 
Desired Skills:
  • 4+ years of UNIX/Linux experience and scripting.
  • Experience in interpreting data models to build user friendly visualizations/dashboards.
  • Experience in statistical techniques and quantitative methodologies that are used in decision making applications.
  • Experience in Agile and Waterfall methodologies.
  • Experience in accessing and modeling NoSQL data models, especially with Cassandra.
  • Knowledge of business intelligence and analytics industry and best practices.
  • Experience using data wrangling, data engineering, and feature engineering software.

Additional Information

All your information will be kept confidential according to EEO guidelines.