AI Engineering Lead

  • Full-time
  • Compensation: USD 200000 - USD 300000 - yearly

Company Description

Vichara is a Financial Services focused products and services firm headquartered in NY and building systems for some of the largest i-banks and hedge funds in the world.

Job Description

Key Responsibilities

🔹 Architecture & System Design

  • Architect, design, and lead multi-agent LLM systems using LangGraph, LangChain, and Promptfoo for prompt lifecycle management and benchmarking.

  • Build Retrieval-Augmented Generation (RAG) pipelines leveraging hybrid vector search (dense + keyword) using LanceDB, Pinecone, or Elasticsearch.

  • Define system workflows for summarization, query routing, retrieval, and response generation, ensuring minimal latency and high precision.

  • Develop RAG evaluation frameworks combining retrieval precision/recall, hallucination detection, and latency metrics — aligned with analyst and business use cases.

🔹 AI Model Integration & Fine-Tuning

  • Integrate GPT-4o, PaLM 2, and open-weight models (LLaMA, Mistral) for task-specific contextual Q&A.

  • Fine-tune transformer models (BERT, SentenceTransformers) for document classification, summarization, and sentiment analysis.

  • Manage prompt routing and variant testing using Promptfoo or equivalent tools.

🔹 Agentic AI & Orchestration

  • Implement multi-agent architectures with modular flows — enabling task-specific agents for summarization, retrieval, classification, and reasoning.

  • Design fallback and recovery behaviors to ensure robustness in production.

  • Employ LangGraph for parallel and stateful agent orchestration, error recovery, and deterministic flow control.

🔹 Data Engineering & RAG Infrastructure

  • Architect ingestion pipelines for structured and unstructured data — including financial statements, filings, and PDF documents.

  • Leverage MongoDB for metadata storage and Redis Streams for async task execution and caching.

  • Implement vector-based search and retrieval layers for high-throughput and low-latency AI systems.

🔹 Observability & Production Deployment

  • Deploy end-to-end AI systems on AWS EKS / Azure Kubernetes Service, integrated with CI/CD pipelines (Azure DevOps).

  • Build comprehensive monitoring dashboards using OpenTelemetry and Signoz, tracking latency, retrieval precision, and application health.

  • Enforce testing and regression validation using golden datasets and structured assertion checks for all LLM responses.

🔹 Cross-functional Collaboration

  • Collaborate with DevOps, MLOps, and application development teams to integrate AI APIs with React / FastAPI-based user interfaces.

  • Work with business analysts to translate credit, compliance, and customer-support requirements into actionable AI agent workflows.

  • Mentor a small team of GenAI developers and data engineers in RAG, embeddings, and orchestration techniques.

Qualifications

  • Experience:
    • 5+ years as an AI or ML Engineer
  • Required Skills & Experience

  • LLMs & GenAI: GPT-4o, PaLM 2, LangGraph, LangChain, Promptfoo, SentenceTransformers

  • RAG Frameworks: LanceDB, Pinecone, ElasticSearch, FAISS, MongoDB

  • Agentic AI: LangGraph multi-agent orchestration, routing logic, task decomposition

  • Fine-Tuning: BERT / domain-specific transformer tuning, evaluation framework design

  • Infra & MLOps: FastAPI, Docker, Kubernetes (EKS/AKS), Redis Streams, Azure DevOps CI/CD

  • Monitoring: OpenTelemetry, Signoz, Prometheus

  • Languages & Tools: Python, SQL, REST APIs, Git, Pandas, NumPy

  • 🧠 Nice-to-Have Skills

  • Knowledge of Reranker-based retrieval (MiniLM / CrossEncoder)

  • Familiarity with Prompt evaluation and scoring (BLEU, ROUGE, Faithfulness)

  • Domain exposure to Credit Risk, Banking, and Investment Analytics

  • Experience with RAG benchmark automation and model evaluation dashboards

Additional Information