On-line automatic data placement in a parallel matrix library

Paul Kelly
Imperial College, London


Suppose we want to hide all the complexity of parallel programming inside a library of parallel operations. For example, we want to call it from languages for which building a parallelising compiler is unattractive. But we want to minimise data redistribution between library calls - but we can't do dataflow analysis of the calling program (in Visual Basic etc).

The problem: you can't tell how each routine's results are going to be used, so you miss the opportunity to select operand distributions and operator implementations which would avoid communication later.

To solve this we use delayed evaluation to capture the control-flow of user programs at runtime. Now we know the context in which values are used, we can propagate data placement constraints backwards. To make this work, we have to optimise really fast, so we have formulated the problem carefully using affine functions to represent data placement and constraints.

To further reduce the overheads, we detect when an equivalent DAG re-occurs, and re-use the result of optimising it. We also optimise incrementally, and perform further optimisation iterations if a potentially sub-optimal execution plan is re-used repeatedly.

Preliminary performance result were presented and there followed a discussion of extensions and applications of the work.


Further info and related paper(s):