In this talk, we show how we can generate the code for such a partitioning. Our first approach was to use classical non unimodular loop transformations, but this resulted in a very inefficient solution where the cost of the hardware that carries out the original code was completely overwhelmed by housekeeping computations. The reason of this overhead was due to complex operators, such as div and mod, and expensive loop bounds evaluations. We present another solution based on a decision-tree of depth (n-1) where n is the number of nested loops, for which only a few adds and conditionals are needed. This solution exploits the mathematical properties of juggling. The reduction of cost, compared with the first approach, is typically an order of magnitude.