PySpark (Python)/Scala Spark

  • Full-time

Job Description

 

  • Performing ETL jobs in Batch Modes.
  • Performing ETL using Real-Time Spark streaming.
  • Python/Scala programming (intermediate level)
  • Hands on experience in Spark version 1.6 and >2.
  • Working with different file formats: Hive, Parquet, CSV, JSON, ORC, Avro etc. Compression techniques.
  • Integrating PySpark with different data sources, example: oracle, postgres, mysql, MS sqlserver etc.
  • SparkSQL, DataFrames & Datasets.
  • Performance Tuning techniques.

 

Additional Information

Good to Have:

  1. Basic ML techniques in spark. (optional for Data Engineering)
  2. Working with Hive, No Sql Databases like Hbase, Cassandra etc