The recent drastic improvements in the field of computer vision have been made possible by deep convolutional neural networks. Promising results expanded the deployment of these new techniques in different applications, such as surveillance and autonomous driving. The training of such models with millions of input data is computationally expensive. Therefore, the training phase is usually done on servers and later transferred to other devices for inference. However, in many of the new applications designed for smart devices with limited computational resources, models are required to be fine-tuned to accommodate the functionality with the new inputs. In this work, we target reducing the computation required for fine-tuning transferred models which are usually used for application-specific tasks and are implemented on embedded devices. Our proposed training method replaces the augmentation of input data in the pixel space by the one in the embedding space. The experimental results show that following this method, the computation of augmented input embeddings can be decreased by about 2 while the accuracy of models is negligibly compromised.