Python Lead (Data Science Engineering)
- Full-time
Company Description
DemandMatrix is a technology company working in AI, machine learning, and big data space. We help B2B technology marketing, sales and product management teams bring relevance in their outreach. With a rapidly growing team of 120+ employees working from Pune, US and Australia we are looking for passionate individuals who wish to make a great career in the hi-tech industry.
Job Description
In this role you will oversee all Data Engineering, Data Science & Machine Learning across the company to drive product innovation and roadmap based on business priorities. Work with the Chief Data Scientist and VP of Engineering to turn the technology into a product, frequently releasing product iterations based on customer feedback.
This is a position that will allow you to be creative, and think outside of the box. You will be given access to very large data sets to play with! You must be intellectually curious, and enjoy looking at large data sets, to discover insights, and piece together stories.
The candidate will lead a team of enthusiastic data engineers and data QC team – he will overlook the deliveries, implementation of roadmaps and enforce strong standards along the way. The team would look for a guidance from the candidate for best practices, engineering benchmarks and work prioritization. This is a rare opportunity to build the next generation of marketing analytics platform. We believe in leading by example. This is a hands on opportunity to lead and build a great platform.
Skills and responsibilities:
- We are looking for a strong tech lead with 5-8 years of hands on experience to guide our data engineering team.
- You must have deep understanding of Python as a data analysis programming language.
- Be fluent with usage of pandas, numpy, nltk, scikit. Understand the best practices of employing these libraries.
- Implementation of one data analysis project on Hadoop platform using pyspark.
- Design data processing and transformation pipelines on Linux.
- Should be aware about the integrations of Linux world, data world and Python.
- Database experience with SQL, No-SQL paradigms, distributed databases for scale.
- Architecting the most optimal data processing platforms on Azure, AWS or Google Cloud as required.
- Knowledge of Amazon Athena, Redshift, Elasticsearch and DynamoDB would be a plus.
- Must have a strong appreciation of data quality, data hygiene and the consequences of bad data on organizations performance.
- Aware of ML, AI and text processing algorithms.
- Aware of chatbot implementations, marketing analytics and customer analytics.
- Understanding of formal DQ metrics.
Qualifications
Python, Bash/Shell/Perl, SQL Scripting, Hadoop