Data Parallel Extensions for Maximizing Locality in Numerical
Irregular Problems
Oscar Plata, Guillermo
Trabado and Emilio Zapata
University of Malaga
The efficient programming of intensive numerical applications with irregular
structure on parallel computers is a very complex task. In general, key
properties of the problem that these codes solve are needed in order to obtain
a good parallel code. However, these properties cannot be inferred from the
code itself (at least, easily). Three approaches can be recognized to
parallelize this class of codes, manual, user-annotated (specifically, data
parallelism), and automatic parallelization. Manual parallelization usually
involves drastic rewriting of the original sequential code, requiring a high
development effort. However, this approach can take into account high-level
problem properties, resulting in very efficient parallel codes. The
data-parallel approach annotates the sequential code with directives that
explicit the parallelism. Basically the directives specify data distributions
and alignments. The compiler is in charge of the rest, the most tedious, part
of the parallelization process. The efficiency of this approach depends on the
ability of the data-parallel language to express problem properties, which is
a difficult and open question for irregular applications. Finally, in the
third approach, the compiler is completely in charge of the parallelization
process. Currently there is active research in this area, as compiler
techniques to recognize and take advantage of problem properties have to be
discovered. No paralellizing compiler is, at the moment, able to efficiently
parallelize complete irregular codes. This work focuses on the data-parallel
approach, and how we can express problem properties using a limited number of
user annotations. As simple molecular dynamics simulation code is taken as a
running example. Nowadays, most of the production applications in this area
have been manually parallelized. We present a parallelization strategy that
offers: high efficiency similar to that of manual parallelization; original
program structure is preserved in resulting parallel code; global data
structures are decomposed in smaller local structures with the same
organization; initial data decomposition and further communications are
handled by calls to an existing runtime support. Finally, we analyze the
introduction of HPF extensions to provide the compiler with information enough
to guess the role of each data structure in particle codes.
Further info and related paper(s):