Senior Expert – Vision-Language Models and Generative AI (GenAI)

  • Full-time
  • Legal Entity: Bosch Global Software Technologies Private Limited

Company Description

Bosch Global Software Technologies Private Limited is a 100% owned subsidiary of Robert Bosch GmbH, one of the world's leading global supplier of technology and services, offering end-to-end Engineering, IT and Business Solutions. With over 28,200+ associates, it’s the largest software development center of Bosch, outside Germany, indicating that it is the Technology Powerhouse of Bosch in India with a global footprint and presence in the US, Europe and the Asia Pacific region.

Job Description

Roles & Responsibilities:
Conduct deep research in:

  • Vision-Language and Multimodal AI for perception and semantic grounding

  • Cross-modal representation learning for real-world sensor fusion (camera, lidar, radar, text)

  • Multimodal generative models for scene prediction, intent inference, or simulation

  • Efficient model architectures for edge deployment in automotive and factory systems

  • Evaluation methods for explainability, alignment, and safety of VLMs in mission-critical applications

  • Spin newer research directions and drive AI research programs for autonomous driving, ADAS, and Industry 4.0 applications.

  • Create new collaborations within and outside of Bosch in relevant domains.

  • Contribute to Bosch’s internal knowledge base, open research assets, and patent portfolio.

  • Lead internal research clusters or thematic initiatives across autonomous systems or industrial AI.

  • Mentor and guide research associates, interns, and young scientists.
     

Qualifications

Educational qualification:

Ph.D. in Computer Science / Machine Learning / AI / Computer Vision or equivalent

Experience:

8+ years (post PhD) in AI related to Vision and Language modalities, excellent exposure and hands on research in GenAI, VLMs, Multimodal AI, or Applied AI Research.

Mandatory/requires Skills:

Deep expertise in:

  • Vision-Language Models (CLIP, Flamingo, Kosmos, BLIP, GIT) and multimodal transformers

  • Open- and closed-source LLMs (e.g., LLaMA, GPT, Claude, Gemini) with visual grounding extensions

  • Contrastive learning, cross-modal fusion, and structured generative outputs (e.g., scene graphs)

  • PyTorch, HuggingFace, OpenCLIP, and deep learning stack for computer vision

  • Evaluation on ADAS/mobility benchmarks (e.g., nuScenes, BDD100k) and industrial datasets

  • Strong track record of publications in relevant AI/ML/vision venues

  • Demonstrated capability to lead independent research programs

  • Familiarity with multi-agent architectures, RLHF, and goal-conditioned VLMs for autonomous agents

Preferred Skills:

Hands-on experience with:

  • Perception stacks for ADAS, SLAM, or autonomous robots

  • Vision pipeline tools (MMDetection, Detectron2, YOLOv8) and video understanding models

  • Semantic segmentation, depth estimation, 3D vision, and temporal models

  • Industrial datasets and tasks: defect detection, visual inspection, operator assistance

  • Lightweight or compressed VLMs for embedded hardware (e.g., in vehicle ECUs or factory edge)

  • Knowledge of reinforcement learning or planning in embodied AI context

  • Strong academic or industry research collaborations

  • Understanding of Bosch domains and workflows in mobility and manufacturing

Privacy PolicyImprint