Deep Neural Network Inference Partitioning in Embedded Hybrid Analog-Digital Systems

Fabian Kreß1, Julian Hoefer2, Qiushi Lin3, Patrick Schmidt2, Zhenhua Zhu3, Yu Zhu3, Tanja Harbaum4, Yu Wang3, Juergen Becker1
1Karlsruhe Institute of Technology - ITIV, 2Karlsruhe Institute of Technology, 3Tsinghua University, 4KIT


Abstract

Deep Neural Networks (DNNs) deployed in resource-constrained embedded systems, like automotive platforms, often face severe memory wall problems and energy-consuming data movements. To address this challenge, existing work has proposed Processing-In-Memory (PIM) architectures based on emerging non-volatile memory, which perform analog-domain computations inside memory, offering great potentials to solve the memory wall problem. However, PIM-based accelerators suffer from circuit noise, impacting DNN accuracy. Therefore, building a heterogeneous multi-chiplet architecture based on energy-efficient analog PIM and high-precision digital computing units has become a promising DNN acceleration solution that achieves both high energy efficiency and high accuracy. In this paper, we present an automated framework to explore layer-wise DNN inference partitioning in hybrid analog-digital multi-chiplet systems. After performing a topological ordering, it analyzes the robustness of each layer to constrain mapping decisions across available analog PIM chiplets and digital chiplets. By considering several functional and performance metrics, the framework identifies Pareto-optimal partitioning schemes. Extensive experimental results on a variety of DNN models show a reduction in latency of up to 52% with a loss in accuracy of less than 1%.