Data Engineer

Full-time

Legal Entity: Bosch Global Software Technologies Private Limited

Company Description

Bosch Global Software Technologies Private Limited is a 100% owned subsidiary of Robert Bosch GmbH, one of the world's leading global supplier of technology and services, offering end-to-end Engineering, IT and Business Solutions. With over 28,200+ associates, it’s the largest software development center of Bosch, outside Germany, indicating that it is the Technology Powerhouse of Bosch in India with a global footprint and presence in the US, Europe and the Asia Pacific region.

Job Description

Roles & Responsibilities:

Role Overview

As part of our AI Factory, the Data Engineer will design, build, and maintain the data foundation enabling AI/ML models and analytics to deliver value. You will be responsible for ingesting, transforming, and optimizing data from multiple HR and enterprise systems, ensuring it is clean, secure, compliant, and ready for AI-driven use cases.

This role is pivotal in enabling data-driven decision-making, ensuring high-quality datasets for our AI/ML engineers, data scientists, and business analysts.

Responsibilities

Design and implement data pipelines to collect, process, and store structured and unstructured data from HRIS, enterprise applications, and external sources.
Build and maintain data lakes, warehouses, and marts for AI/ML and analytics use.
Optimize data quality, performance, and availability through validation, cleansing, and transformation processes.
Work with Business Analysts to understand data requirements and with AI/ML Engineers to ensure model-ready datasets.
Implement data governance practices, ensuring compliance with GDPR, CCPA, and AI-related regulations.
Integrate real-time and batch data processing solutions.
Automate workflows using orchestration tools and CI/CD pipelines.
Monitor data pipelines and troubleshoot performance or integrity issues.
Document data models, schemas, and workflows for ongoing maintainability.
Stay updated on emerging data engineering, AI/ML data processing, and cloud platform trends.

Qualifications

Educational qualification:

Bachelor’s or Master’s in Computer Science, Data Engineering, Information Systems, or related field.
Certifications in cloud data engineering (AWS, Azure, or GCP) are a plus.

Experience:

Overall, 5-8 years of experience, 3 years of experience in data pipelines, cloud data platforms, and large-scale datasets (HR or enterprise data preferred).

Mandatory/requires Skills:

Programming Languages: Python, SQL (expert level).
Data Processing Frameworks: Apache Spark, PySpark, or Flink.
Cloud Platforms: AWS (Glue, Redshift, S3), Azure (Data Factory, Synapse), or GCP (BigQuery, Dataflow).
Data Storage: Data lakes (S3, ADLS, GCS), relational databases (PostgreSQL, MySQL), NoSQL databases (MongoDB, DynamoDB).
ETL/ELT Tools: dbt, Apache Airflow, Talend, or Informatica.
Data Modeling: Dimensional modeling, star/snowflake schema design.
Strong understanding of data privacy, security, and compliance frameworks.
Proven experience integrating APIs and streaming data sources.

Preferred Skills:

Experience with HR data structures from platforms like SAP SuccessFactors, Workday, Oracle HCM.
Familiarity with MLOps data preparation workflows.
Knowledge of vector databases for AI search use cases.
Understanding of data versioning tools (e.g., DVC, Delta Lake).
Experience with data catalog and metadata management tools.
Exposure to machine learning feature store concepts.

Additional Information

Overall, 5-8 years of experience, 3 years of experience in data pipelines, cloud data platforms, and large-scale datasets (HR or enterprise data preferred).

I'm interested