Performance Optimization of a Class of Loops
P. Sadayappan
Ohio State
University, Columbus
The problem of optimizing a class of parallel loops for parallel execution is
considered. The problem is motivated from examples in computational physics
where multi-dimensional summations of products of arrays are performed. The
overall optimization problem is divided into three sub-problems: operation
count optimization, communication optimization, and data locality
optimization. The operation count optimization problem is found to be
NP-complete. A polynomial time dynamic programming algorithm is developed for
the communication optimization problem. The issues arising with the data
locality optimization problem are discussed: loop permutation, tiling, and
loop fusion. Partial solutions to the data optimization problem are presented.
Further info and related paper(s):