2026_MS_EMT2-VM_DataEngineer_Sr_PythonSpecialist

  • Full-time
  • Legal Entity: Bosch Global Software Technologies Private Limited

Company Description

Bosch Global Software Technologies Private Limited is a 100% owned subsidiary of Robert Bosch GmbH, one of the world's leading global supplier of technology and services, offering end-to-end Engineering, IT and Business Solutions. With over 27,000+ associates, it’s the largest software development center of Bosch, outside Germany, indicating that it is the Technology Powerhouse of Bosch in India with a global footprint and presence in the US, Europe and the Asia Pacific region.

Job Description

We are seeking a Data Backbone Engineer / Architect to design and build robust, scalable data systems that integrate data from multiple sources into a cohesive backbone. This role combines strong data engineering fundamentals with AI/ML and LLM integration, enabling intelligent, context-aware data platforms.Key ResponsibilitiesDesign and build a centralized data backbone/platform integrating data from diverse sources (APIs, databases, files, streaming systems)Develop and maintain scalable data pipelines using Python for ingestion, transformation, and processingCreate and manage data models and database schemas optimized for performance and scalabilityDesign and maintain data dependency graphs / lineage systems to track data flow and relationshipsEnsure data consistency, integrity, and quality across systemsCollaborate with domain experts to translate business logic into robust data models and AI-ready datasetsBuild reusable frameworks for data orchestration and workflow managementIntegrate AI/ML pipelines into the data backbone for training, inference, and feature engineeringDesign and manage feature stores for machine learning modelsEnable LLM-based applications by structuring and curating high-quality datasets for retrieval (RAG pipelines, embeddings, vector databases)Implement pipelines for data preprocessing, labeling, and augmentation for ML use casesOptimize data storage, querying, and processing performanceImplement monitoring, logging, and alerting for both data and ML pipelinesSupport model lifecycle workflows (training, evaluation, deployment, versioning)Incorporate feedback loops for continuous model and data improvementDocument data architecture, flows, dependencies, and AI pipeline integrations clearlyRequired Skills & QualificationsStrong proficiency in Python for data engineering (Pandas, NumPy, PySpark, etc.)Solid experience in data modeling and database design (relational and/or NoSQL)Hands-on experience with SQL and performance tuningExperience building ETL/ELT pipelinesStrong understanding of data structures, dependency graphs (DAGs), and workflow orchestration (Airflow-like systems)Experience working with heterogeneous data sources (structured, semi-structured, unstructured)Good understanding of data validation and quality frameworksHands-on experience with ML frameworks (e.g., scikit-learn, TensorFlow, PyTorch)Understanding of feature engineering and model evaluation techniquesExperience with LLM ecosystems (e.g., embeddings, prompt engineering, vector databases like Pinecone/FAISS, RAG architectures)Familiarity with LLM orchestration frameworks (e.g., LangChain, LlamaIndex or similar)Knowledge of model deployment and serving (APIs, batch/real-time inference)Strong problem-solving and analytical skills

Qualifications

B.E

Additional Information

6

Privacy NoticeImprint