Rendering images in AR/VR environments incurs significant costs due to the need to meet users' expectations for high-quality visuals, a challenge amplified in real-time applications. Gaze-tracked foveated rendering (TFR) offers great potential by dynamically adjusting rendering resolution based on human gaze, enabling substantial cost savings. However, existing AI-based gaze tracking solutions suffer from high tracking errors and high execution cost. This work addresses these challenges by co-optimizing AI algorithms with the underlying hardware to achieve efficient gaze tracking and superior rendering cost savings.