2025-11-18T21:55:13.846797

Chiplet-Based RISC-V SoC with Modular AI Acceleration

Ramkumar, Bharadwaj

Achieving high performance, energy efficiency, and cost-effectiveness while maintaining architectural flexibility is a critical challenge in the development and deployment of edge AI devices. Monolithic SoC designs struggle with this complex balance mainly due to low manufacturing yields (below 16%) at advanced 360 mm^2 process nodes. This paper presents a novel chiplet-based RISC-V SoC architecture that addresses these limitations through modular AI acceleration and intelligent system level optimization. Our proposed design integrates 4 different key innovations in a 30mm x 30mm silicon interposer: adaptive cross-chiplet Dynamic Voltage and Frequency Scaling (DVFS); AI-aware Universal Chiplet Interconnect Express (UCIe) protocol extensions featuring streaming flow control units and compression-aware transfers; distributed cryptographic security across heterogeneous chiplets; and intelligent sensor-driven load migration. The proposed architecture integrates a 7nm RISC-V CPU chiplet with dual 5nm AI accelerators (15 TOPS INT8 each), 16GB HBM3 memory stacks, and dedicated power management controllers. Experimental results across industry standard benchmarks like MobileNetV2, ResNet-50 and real-time video processing demonstrate significant performance improvements. The AI-optimized configuration achieves ~14.7% latency reduction, 17.3% throughput improvement, and 16.2% power reduction compared to previous basic chiplet implementations. These improvements collectively translate to a 40.1% efficiency gain corresponding to ~3.5 mJ per MobileNetV2 inference (860 mW/244 images/s), while maintaining sub-5ms real-time capability across all experimented workloads. These performance upgrades demonstrate that modular chiplet designs can achieve near-monolithic computational density while enabling cost efficiency, scalability and upgradeability, crucial for next-generation edge AI device applications.

academic

Chiplet-Based RISC-V SoC with Modular AI Acceleration

基本信息

论文ID: 2509.18355
标题: Chiplet-Based RISC-V SoC with Modular AI Acceleration
作者: Suhas Suresh Bharadwaj (Birla Institute of Technology and Science, Pilani – Dubai), Prerana Ramkumar (American University of Sharjah)
分类: cs.AR (Computer Architecture), cs.AI (Artificial Intelligence)
发表时间/会议: 未明确发表会议信息
论文链接: https://arxiv.org/abs/2509.18355

摘要

本文提出了一种新颖的基于chiplet的RISC-V SoC架构，通过模块化AI加速和智能系统级优化来解决边缘AI设备在高性能、能效和成本效益方面的平衡挑战。该设计在30mm×30mm硅中介层上集成了四项关键创新：自适应跨chiplet动态电压频率调节(DVFS)、AI感知的UCIe协议扩展、分布式加密安全和智能传感器驱动的负载迁移。实验结果显示，相比基础chiplet实现，AI优化配置实现了14.7%的延迟降低、17.3%的吞吐量提升和16.2%的功耗减少，整体效率提升40.1%。

研究背景与动机

问题定义

边缘AI平台需要满足严格的性能要求，包括亚毫秒级端到端延迟和低于2W的功耗包络，同时执行日益复杂的深度网络如MobileNetV2和ResNet-50。然而，传统的单片式系统级芯片(SoC)方法面临制造和良率挑战。

问题重要性

市场需求: 到2030年预计将有5000亿台设备，其中边缘AI平台将占据重要份额
技术挑战: 先进工艺节点下，几百平方毫米芯片面积的良率极低(低于16%)
应用需求: 自动驾驶、工业自动化、医疗等领域对实时推理能力要求严格

现有方法局限性

单片式SoC: 在先进工艺节点下制造良率低，经济性差
传统DVFS: 电压转换时间长(数十微秒级)，限制了精细化调整
安全集成: 多供应商chiplet集成带来安全风险，包括假冒、克隆和供应链篡改

研究动机

基于chiplet的2.5D集成技术通过将大型SoC分解为更小的异构芯片，通过高密度中介层互连，提供了实用的替代方案。

核心贡献

提出了基于chiplet的RISC-V SoC架构：集成7nm RISC-V CPU chiplet、双5nm AI加速器(各15 TOPS INT8)、16GB HBM3内存和专用电源管理控制器
实现了四项关键系统创新：
- 自适应跨chiplet DVFS系统
- AI感知的UCIe协议扩展
- 分布式加密安全框架
- 智能热管理系统
验证了显著性能提升：相比基础chiplet实现，实现14.7%延迟降低、17.3%吞吐量提升、16.2%功耗减少
证明了实时处理能力：在所有测试工作负载上保持亚5ms实时能力