Preliminary Program

Piecewise-Linear Approximation of Self-Attention and Its Accuracy-Aware Training for Area-Efficient Vision Transformer Inference Accelerator

Teppei Kawamura¹, Yutaka Masuda², Tohru Ishihara²
¹Nagoya Univercity, ²Nagoya University

Abstract

Vision Transformer (ViT) is a neural network model specialized for image processing, introduced in 2021. The key feature of ViT is self-attention, which allows it to effectively capture a wide range of correlations within an image while the biggest drawback is its hardware-hungry characteristics. This paper proposes an area-efficient hardware implementation for self-attention. The proposed approach introduces piecewise linear approximation to both multiplication and Softmax function. Additionally, we showcase a training method suitable for the multiplication and Softmax where the corresponding circuits are significantly simplified through the piecewise-linear approximation. Evaluation using CIFAR-10 demonstrated that our approximation applied to self-attention achieved comparable image classification accuracy to the model without the approximation. Moreover, with the proposed method, we experimentally confirmed that the area of the self-attention processing circuit could be reduced by 64.0% compared to existing method applying 8-bit integer quantization to all the weights in self-attention.