Determining the idle time of a K-dimensional tiling

Karin Högstedt, Larry Carter, Jeanne Ferrante
University of California, San Diego


For a compiler to yield high-performance code we need to exploit the architectural features of the target machine. Tiling is used to improve both the locality and the utilization of the available parallelism, but with the more complicated memory hierarchies of today's computers tiling also needs to be applied at (possibly) all levels of the memory hierarchy. The goal of tiling is to minimize the execution time. Not to maximize the locality or available parallelism, which has been the optimization goal of many researchers in the past. Tiling should therefore be applied using a multi-level cost-function taking all memory levels, locality and parallelism into account.

We model the execution time by a recursive formula where the execution time of the iteration space at a certain level in the memory hierarchy depends on, among other things, the execution time of a tile, the idle time, loop overhead etc. We intend to model all these different parts with closed-form formulae, which combined give us the total execution time of the iteration space. Our work has been concentrated on deriving a formula for the idle time due to parallelism, i.e. the time a memory module spends waiting either for data computed on a different processor or at a synchronization point.

We define a sub-class of convex iteration spaces, so called rectilinear iteration spaces, for which we can derive a closed-form formula for the idle time due to parallelism, even though in the general case this problem only can be solved using linear programming. The main parameter of this formula is the rise, which intuitively is a measure of the difference between the shape of the iteration space and the shape of the tiles.


Further info and related paper(s):