Data Parallel Extensions for Maximizing Locality in Numerical Irregular Problems

Oscar Plata, Guillermo Trabado and Emilio Zapata
University of Malaga


The efficient programming of intensive numerical applications with irregular structure on parallel computers is a very complex task. In general, key properties of the problem that these codes solve are needed in order to obtain a good parallel code. However, these properties cannot be inferred from the code itself (at least, easily). Three approaches can be recognized to parallelize this class of codes, manual, user-annotated (specifically, data parallelism), and automatic parallelization. Manual parallelization usually involves drastic rewriting of the original sequential code, requiring a high development effort. However, this approach can take into account high-level problem properties, resulting in very efficient parallel codes. The data-parallel approach annotates the sequential code with directives that explicit the parallelism. Basically the directives specify data distributions and alignments. The compiler is in charge of the rest, the most tedious, part of the parallelization process. The efficiency of this approach depends on the ability of the data-parallel language to express problem properties, which is a difficult and open question for irregular applications. Finally, in the third approach, the compiler is completely in charge of the parallelization process. Currently there is active research in this area, as compiler techniques to recognize and take advantage of problem properties have to be discovered. No paralellizing compiler is, at the moment, able to efficiently parallelize complete irregular codes. This work focuses on the data-parallel approach, and how we can express problem properties using a limited number of user annotations. As simple molecular dynamics simulation code is taken as a running example. Nowadays, most of the production applications in this area have been manually parallelized. We present a parallelization strategy that offers: high efficiency similar to that of manual parallelization; original program structure is preserved in resulting parallel code; global data structures are decomposed in smaller local structures with the same organization; initial data decomposition and further communications are handled by calls to an existing runtime support. Finally, we analyze the introduction of HPF extensions to provide the compiler with information enough to guess the role of each data structure in particle codes.


Further info and related paper(s):