Keep All in Memory with Maxwell: a Near-SRAM Computing Architecture for Edge AI Applications

Gregoire Axel Eggermann1, Giovanni Ansaloni1, David Atienza2
1EPFL, 2École Polytechnique Fédérale de Lausanne (EPFL)


Abstract

Recent advances in machine learning have dramatically increased model size and computational requirements, increasingly straining computing system capabilities. This tension is particularly acute for resource-constrained edge scenarios, for which careful hardware acceleration of computing-intensive patterns and the optimization of data reuse to limit costly data transfers are key. Addressing these challenges, we herein present a novel compute-memory architecture named Maxwell, which supports the execution of entire inference algorithms near-memory. Leveraging the regular structure of memory arrays, Maxwell achieves a high degree of parallelization for both convolutional (CONV) and fully connected (FC) layers, while supporting fine-grained quantization. Additionally, the architecture effectively minimizes data movements by performing near-memory all intermediate computations, such as scaling, quantization, activation functions, and pooling layers. We demonstrate that such an approach leads to up to 8.5x speed-ups, with respect to state-of-the-art near-memory architectures that require the transfer of data at the boundaries of CONV and/or FC layers. Accelerations of up to 250x with respect to software execution are observed on an edge platform that integrates Maxwell logic and a 32-bit RISC-V core, with Maxwell-specific components only accounting for 10.6% of the memory area.