Staff Engineer (Data Engineer – AI & Digital Platforms)

  • Full-time
  • Service Region: Others

Company Description

We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at a scale — across all devices and digital mediums, and our people exist everywhere in the world (17500+ experts across 36 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!

Job Description

Data Engineer – AI & Digital Platforms

Must-Have Skills

  • Hadoop and MapReduce
  • Cloudera
  • AI-enabled Application Development
  • Machine Learning – General Experience
  • LLM Application Frameworks (Capable)

Key Responsibilities

  • Design and develop scalable data pipelines across Hadoop (Hive, Impala, Spark, Kafka, Iceberg) and Teradata environments.
  • Build ingestion and transformation frameworks using Java, Spark, Python, and Shell scripts.
  • Develop full stack applications and internal tools using Python, Shell scripting, and modern web frameworks (Flask, React).
  • Create APIs and microservices to expose data and ML models securely to downstream systems and user interfaces.
  • Collaborate with data scientists to operationalize ML models using Cloudera Machine Learning (CML).
  • Build and deploy GenAI/LLM-powered applications for intelligent data interaction, summarization, and automation.
  • Implement enterprise-grade security controls including RBAC, LDAP, Kerberos, Apache Ranger, and row-level access.
  • Tune and optimize data applications for performance across Hadoop and Teradata, ensuring efficient resource utilization
    • Support sandbox environments for prototyping, enabling users to build ML models, dashboards, and data pipelines.

    Required Skills & Experience

    Data Engineering

    • Strong experience with Hadoop ecosystem (Hive, Impala, Spark, Kafka, Iceberg, Ranger, Atlas), Teradata, and data pipeline orchestration.
    • Experience with MPP databases (e.g., Trino, Presto).
    • Proven ability in development and performance tuning of large-scale data applications.

 

Full Stack Development

  • Proficiency in Python, Shell scripting, REST APIs, and web frameworks (Flask, React).

Machine Learning & AI

  • Hands-on experience with ML platforms (CML), Spark MLlib, and Python ML libraries (scikit-learn, XGBoost).
  • Experience in operationalizing ML models at enterprise scale.

GenAI/LLM Applications

  • Familiarity with building applications using large language models (OpenAI, Hugging Face, LangChain).
  • Ability to build agent workflows and support users in creating agent-based solutions.

Security & Governance

  • Experience with enterprise data security (LDAP, Kerberos, RBAC), data masking, and access control.

Performance Tuning

  • Strong expertise in optimizing data applications and queries in Hadoop and Teradata environments.

Tools & Platforms

  • Cloudera Data Platform (CDP), Informatica, QlikSense, Apache Oozie, Git, CI/CD pipelines.

 

Soft Skills

  • Strong analytical and problem-solving skills.
  • Excellent communication abilities.
  • Ability to work effectively in cross-functional teams

Privacy NoticeImprint