Recently, the opportunities of parallel computing are expanding rapidly in various applications including neural networks and machine learning. It is, however, not at all straightforward to develop an efficient algorithm for each parallel computing environment since communications always introduce overhead in computation. In this paper, we propose a design method of optimum parallel computing under user-specified communication constraints. The basic strategy is to automatically generate optimum scheduling from small instances of the target problem and then they are semi-automatically generalized to much larger problems. Several experiments targeting matrix vector multiplication and convolutional neural networks have been conducted. Their results show the correctness and usefulness of the proposed method as well as its scalability.