Currently, intense work is underway to develop memristor crossbar arrays for high density, nonvolatile memory applications. However, another capability of memristor crossbars – natural dot-product operation for vectors and matrices – holds even greater potential for next-generation computing, including accelerators, neuromorphic computing, and heterogeneous computing. In this paper, we present a dot-product engine (DPE) based on memristor crossbars optimized for dense matrix computation, which is dominated in most machine learning algorithms. We explored multiple methods to enhance DPE’s dot-product computing accuracy. Moreover, instead of training crossbars, we try to directly use existing software-trained weight matrices on DPEs so no heroic effort is needed to innovate learning algorithms for new hardware. Our results show that computations utilizing DPEs can achieve 1000 ~ 10000 times better speed-efficiency product comparing to a state-of-art ASIC. And machine learning algorithm utilizing DPEs can easily achieve software-level accuracy on testing. Both experimental demonstrations and data-calibrated circuit simulations are presented to demonstrate the realistic implementation of a memristor crossbar DPE.