Code generation for the juggling technique

Alain Darte
LIP, Ecole Normale Supérieur, Lyon


This talk follows Rob Schreiber's explanations on ``how to juggle''. Starting from a set of perfectly nested loops, the juggling technique builds a closed-form formula for the mapping and the scheduling of computations that achieves a Locally Sequential Globally Parallel partitioning of the loops.

In this talk, we show how we can generate the code for such a partitioning. Our first approach was to use classical non unimodular loop transformations, but this resulted in a very inefficient solution where the cost of the hardware that carries out the original code was completely overwhelmed by housekeeping computations. The reason of this overhead was due to complex operators, such as div and mod, and expensive loop bounds evaluations. We present another solution based on a decision-tree of depth (n-1) where n is the number of nested loops, for which only a few adds and conditionals are needed. This solution exploits the mathematical properties of juggling. The reduction of cost, compared with the first approach, is typically an order of magnitude.


Further info and related paper(s):