Real-time Thermal Map Characterization and Analysis for Commercial GPUs with AI Workloads

Jincong Lu1, Sachin Sachdeva1, Yuxuan Lin2, Sheldon Tan3
1University of California, Riverside, 2The Overlake School, 3University of California at Riverside


Abstract

AI accelerators such as GPUs are critical for emerging generative AI applications. However, due to the large power consumption of those AI chips, accurate characterization and modeling of spatial thermal map behaviors over practical AI workloads are important for designing better computing resource management and cooling solutions. However, it is challenging to obtain accurate thermal maps of those chips under practical AI workloads due to demanding cooling requirements, which prevents the direct thermal measurement of chips. This paper investigates the spatial thermal maps of the commercial NVIDIA GeForce RTX 4060 GPU chip for the first time. First, we managed to obtain the real-time full-chip thermal maps of the NVIDIA GPU with timed workloads supported by a bottom-cooling thermal IR imaging system. Second, we observed that the thermal hot spots are actually located in the I/O and peripheral areas of the chips, which indicates the intensive data movements in the GPU. The computing areas of GPU, on the other hand, show smoother thermal distributions with a few hotspots, which is quite different than commercial multi-core CPUs. Furthermore, based on real-time information from the NVIDIA System Management Interface (NVIDIA-SMI), we developed a deep neural networks based full-chip thermal map estimation method, called GPUThermalMap, for GPUs for the first time, which can be instrumental for dynamic thermal, power, and reliability management. Numerical results highlight the effectiveness of GPUThermalMap in achieving highly accurate thermal map predictions, boasting an RMSE of only 0.19℃ or 0.6% of the full-scale error. It also outperforms the GAN-based method, ThermalGAN, by 2.09x in terms of accuracy on average. Besides, the proposed model offers real-time estimation with a rapid speed of 22ms on the target chip.