Performance Optimization of a Class of Loops

P. Sadayappan
Ohio State University, Columbus


The problem of optimizing a class of parallel loops for parallel execution is considered. The problem is motivated from examples in computational physics where multi-dimensional summations of products of arrays are performed. The overall optimization problem is divided into three sub-problems: operation count optimization, communication optimization, and data locality optimization. The operation count optimization problem is found to be NP-complete. A polynomial time dynamic programming algorithm is developed for the communication optimization problem. The issues arising with the data locality optimization problem are discussed: loop permutation, tiling, and loop fusion. Partial solutions to the data optimization problem are presented.


Further info and related paper(s):