Principal Engineer, ASIC Development Engineering (High Bandwidth Flash Frontend)
- Full-time
- Job Type (exemption status): Exempt position - Please see related compensation & benefits details below
- Business Function: ASIC Development Engineering
- Work Location: Bangalore PTP Office (IBP)--LOC_WDT_IBP
Company Description
Company Description
Sandisk understands how people and businesses consume data and we relentlessly innovate to deliver solutions that enable today’s needs and tomorrow’s next big ideas. With a rich history of groundbreaking innovations in Flash and advanced memory technologies, our solutions have become the beating heart of the digital world we’re living in and that we have the power to shape.
Sandisk meets people and businesses at the intersection of their aspirations and the moment, enabling them to keep moving and pushing possibility forward. We do this through the balance of our powerhouse manufacturing capabilities and our industry-leading portfolio of products that are recognized globally for innovation, performance and quality.
Sandisk has two facilities recognized by the World Economic Forum as part of the Global Lighthouse Network for advanced 4IR innovations. These facilities were also recognized as Sustainability Lighthouses for breakthroughs in efficient operations. With our global reach, we ensure the global supply chain has access to the Flash memory it needs to keep our world moving forward.
Job Description
Job Description
In this Frontend Architect position, you will develop High Bandwidth Flash (HBF) based advanced system architectures and AI/ML Accelerator ASIC architecture specifications for Sandisk’s next generation products. You will drive, initiate, and analyze frontend architecture of the AI/ML Accelerator product. As a Frontend Architect you will help drive new architecture initiatives that leverage the state-of-the-art frontend interfaces like UCIe, PCIe, CXL, UAL, etc that integrates HBF with xPU in a 3D package system. You will drive the HBF based frontend architecture. You will exercise your technical expertise and excellent communication skills to collaborate with design and product planning with an eye towards delivering innovative and highly competitive adaptive accelerators solutions. Typical activities include writing architecture spec, working with other architects in the team, work with RTL/DV/Simulation/Emulation/FW teams to evaluate these changes and assess the performance, power, area, and endurance of the product. You will work closely with excellent colleague engineers, cope with complex challenges, innovate, and develop products that will change the data centric architecture paradigm.
KEY RESPONSIBILITIES:
- Responsible for driving the SoC architecture, with a particular focus on I/O subsystems connected over UCIe, PCIe, UAL or CXL.
- Define I/O subsystem and PCIe DMA architectures, including their interactions with internal embedded processor-subsystems, Network on Chip, Memory controllers, and FPGA fabric.
- Create flexible and modular I/O subsystem architectures that can be deployed in either chiplet, monolithic or 3D form factors.
- Work with customers, and cross-functional teams to scope SoC requirements, analyze PPA tradeoffs, and then define architectural requirements that meet the PPA and schedule targets.
- Define I/O subsystem and DMA hardware, software, and firmware interactions with embedded processing subsystems and SoC CPUs on the device side and Host CPUs.
- Author architecture specifications in clear and concise language. Guide and assist pre-silicon design/verification and post-silicon validation during the execution phase.
- Responsible for improving the AI/ML ASIC Architecture performance through hardware & software co-optimization, post-silicon performance analysis, and influencing the strategic product roadmap.
- Create flexible and modular I/O subsystem architectures that can be deployed in either chiplet, monolithic or 3D form factors.
- Work with customers, and cross-functional teams to scope SoC requirements, analyze PPA tradeoffs, and then define architectural requirements that meet the PPA and schedule targets.
- Define I/O subsystem and DMA hardware, software, and firmware interactions with embedded processing subsystems and external CPUs.
- Author architecture specifications in clear and concise language for AI/ML xPU based Accelerator using HBF. Guide and assist pre-silicon design/verification and post-silicon validation during the execution phase.
- LLM Workload analysis and characterization of ASIC and competitive datacenter and AI solutions to identify opportunities for performance improvement in our products.
- Experience architecting one or some components of AI/ML accelerator ASICs such as HBM, PCIe/UCIe/CXL, NoC, DMA, Firmware Interactions, NAND, xPU, fabrics, etc
- Drive the HBF frontend system architecture with GPU/TPU/NPU/xPU to match or exceed the nextgen HBM bandwidth
- Architect memory-efficient inference/training systems utilizing techniques like pruning, quantization with MX format , continuous batching/chunked prefill, and speculative decoding
- Collaborate with internal and external stakeholders/ML researchers to disseminate results and iterate at rapid pace
PREFERRED EXPERIENCE:
- Strong technical background architecting SoC and I/O subsystems involving PCIe and PCIe-DMA engines, or UCIe or CXL or UAL
- Strong IO subsystem microarchitecture, technical, and working knowledge of the PCIe/UCIe protocol specifications
- Knowledge of I/O Subsystem and DMA interactions with internal embedded processor-subsystems (x86, RISC-V or ARM) and external host CPU
- Good understanding of computer/graphics architecture, ML, LLM
- Architecting an GPU/TPU/xPU Accelerator systems with optimized high bandwidth memory hierarchy and frontend architecture for multi-trillion parameter LLM training/inference including Dense, Mixture of Experts (MoE) with multiple modalities (text, vision, speech)
- Deep experience optimizing large-scale ML systems, GPU architectures
- Proficiency in principles and methods of microarchitecture, software, and hardware relevant to performance engineering
- Familiarity and background in UCIe, CXL, NVLink, or UAL microarchitecture and protocols is a plus
- Familiarity with High-speed networking: InfiniBand, RDMA, NVLink is a plus
- Knowledge of bridging and ordering rule enforcement between on-chip protocols such as AXI, and off-chip protocols such as PCIe desired
- Knowledge of ARM Processors and AXI Interconnects desired
- Expert knowledge of transformer architectures, attention mechanisms, and model parallelism techniques
- Multi-disciplinary experience, including familiarity with Firmware and ASIC design
- KV cache optimization, Flash Attention, Mixture of Experts
- Expertise in CUDA programming, GPU memory hierarchies, and hardware-specific optimizations
- Proven track record architecting distributed training systems handling large scale systems
- Previous experience with NVMe storage systems, protocols, and NAND flash – advantage
Qualifications
- Bachelors or Masters or PhD in Computer/Electrical Engineering with 8+ years of hands-on Architecture experience authoring specifications
Additional Information
Sandisk thrives on the power and potential of diversity. As a global company, we believe the most effective way to embrace the diversity of our customers and communities is to mirror it from within. We believe the fusion of various perspectives results in the best outcomes for our employees, our company, our customers, and the world around us. We are committed to an inclusive environment where every individual can thrive through a sense of belonging, respect and contribution.
Sandisk is committed to offering opportunities to applicants with disabilities and ensuring all candidates can successfully navigate our careers website and our hiring process. Please contact us at [email protected] to advise us of your accommodation request. In your email, please include a description of the specific accommodation you are requesting as well as the job title and requisition number of the position for which you are applying