Preliminary Program

Deep Neural Network Inference Partitioning in Embedded Hybrid Analog-Digital Systems

Fabian Kreß¹, Julian Hoefer², Qiushi Lin³, Patrick Schmidt², Zhenhua Zhu³, Yu Zhu³, Tanja Harbaum⁴, Yu Wang³, Juergen Becker¹
¹Karlsruhe Institute of Technology - ITIV, ²Karlsruhe Institute of Technology, ³Tsinghua University, ⁴KIT

Abstract

Deep Neural Networks (DNNs) deployed in resource-constrained embedded systems, like automotive platforms, often face severe memory wall problems and energy-consuming data movements. To address this challenge, existing work has proposed Processing-In-Memory (PIM) architectures based on emerging non-volatile memory, which perform analog-domain computations inside memory, offering great potentials to solve the memory wall problem. However, PIM-based accelerators suffer from circuit noise, impacting DNN accuracy. Therefore, building a heterogeneous multi-chiplet architecture based on energy-efficient analog PIM and high-precision digital computing units has become a promising DNN acceleration solution that achieves both high energy efficiency and high accuracy. In this paper, we present an automated framework to explore layer-wise DNN inference partitioning in hybrid analog-digital multi-chiplet systems. After performing a topological ordering, it analyzes the robustness of each layer to constrain mapping decisions across available analog PIM chiplets and digital chiplets. By considering several functional and performance metrics, the framework identifies Pareto-optimal partitioning schemes. Extensive experimental results on a variety of DNN models show a reduction in latency of up to 52% with a loss in accuracy of less than 1%.