As computationally expensive applications such as neural networks gain popularity, approximate computing has emerged as a solution for significantly reducing the energy and latency costs of extensive computational workloads. In this paper, we propose a highly accurate approximate floating point Multiply-and-Accumulate (MAC) unit for GPUs which significantly decreases power and delay costs of a MAC operation. We propose an intelligent input analysis scheme to approximate the addition stage of a MAC operation and an efficient Approximate Multiplier to simplify the multiplication stage. Our design has tunable accuracy, offering the flexibility of exchanging accuracy for increased efficiency. We evaluated our proposed design over a range of multimedia and machine learning applications. Our design offers up to 2.18x and 3.21x EDP improvement for machine learning and multimedia applications respectively while providing comparable quality to an exact GPU.