PySpark (Python)/Scala Spark
- Full-time
Job Description
- Performing ETL jobs in Batch Modes.
- Performing ETL using Real-Time Spark streaming.
- Python/Scala programming (intermediate level)
- Hands on experience in Spark version 1.6 and >2.
- Working with different file formats: Hive, Parquet, CSV, JSON, ORC, Avro etc. Compression techniques.
- Integrating PySpark with different data sources, example: oracle, postgres, mysql, MS sqlserver etc.
- SparkSQL, DataFrames & Datasets.
- Performance Tuning techniques.
Additional Information
Good to Have:
- Basic ML techniques in spark. (optional for Data Engineering)
- Working with Hive, No Sql Databases like Hbase, Cassandra etc