DEA - Module ALPA : Algorithmique Parallèle
2000-2001
version francaise
de ce document
Instructors
Sanjay Rajopadhye (CNRS
Researcher), and Tanguy Risset
(INRIA Researcher).
Schedule
Thursdays 10h00 - 12h00, Salle Jersey
Course Outline and Plan
ALPA is a postgraduate course offered by IFSIC to DEA students (in their fifth
year of post-high school education). It meets for lectures for two hours per
week for 10 weeks. The final grade is based a 3 hour final exam plus a report
on an "individual study" component. The lectures are supplemented by a number
of exercises (not graded). In the 2000-2001 year, ALPA is a required class
for all DEA students (about 35). The independent study is on "tiling", a loop
transformation used for locality improvement in high performance codes.
Schedule
As the course proceeds, this will evolve to contain copies of the lecture
transparencies, exercises (and possibly solutions), instructions, etc.
- Weeks 1 & 2 (Rajopadhye): Basic notions of parallelism,
speedup and efficiency, Amdahl's law. The PRAM model, pointer jumping, divide
and parallelize, algorithms for list ranking, parallel prefix, Euler tours,
etc. Brent's theorem, notions of depth and size and efficient simulation.
Relative power of concurrent reads and writes. This material is based on Ch
30 (sections 30.1 to 30.3) of the book Algorithms by Cormen, Liserson
et Rivest, and some supplementary notes.
- Week 3 (Rajopadhye): The BSP (Bulk Synchronous
Parallelism) model, proposed in A bridging model for parallel computation
by Valiant in "Communications of the ACM" en August 1990 (v33 n8 p103).
Our main reference will be Questions
and answers about BSP by Skillicorn, Hill and McColl (Scientific
Programming, 6(3): 249-274, Fall 1997). There are a number of sites for
additional information, such as the one
maintained by McColl and another which is more tutorial oriented tutorial.
There is also an organization BSP
Worldwide of researchers and users of BSP.
- Weeks 5, 6,& 7 (Rajopadhye): Loop parallelization and
the polyhedral model. Dependence analysis, single assignment form, systems of
recurrence equations, scheduling and parallelism detection, space time
transformations, and code generation.
- 25 November to 16 December (Risset): Parallel
algorithms under the communication network model: linear arrays, grids and
trees. Course notes for linear arrays (in French) are available here. Here are a few exercises
for this material, and the answers are (will soon be) available here.
Similar courses are taught at many other places, such as University of North
Carolina, IIT
Bombay.
Independent Study
ALPA involves a detailed independent study of a current topic or research.
You will start with a few given references, but it is up to your initiative to
follow other citations of what you read, to make good notes of the subject and
develop a clear understanding of the subject. A part of the final exam will
be based on this study. You are very strongly encouraged to work together for
your reading, and to meet and discuss your readings with colleagues and with
the instructors outside class time. It is also crucial to develop a good plan
for the study and not leave it all to the 11-th hour.
Tiling is a regular partitioning of a uniform index space representing either
computations (e.g., the iteration space of a loop program), or data (e.g.,
arrays distributed over the processors of a parallel machine). Originally
presented in a seminal paper by Irigoin and Triolet [4]
for increasing the granularity of parallel computations, tiling can be used to
achieve many different performance goals: exploiting data locality in
hierarchical memory machines, communication optimisation by message
aggregation, overlap of communication and computation, latency avoidance etc.
Two other early papers describing the transformation are due to Schreiber and
Dongarra [7] and Ramanujam and Sadayappan [6]. Xue [8] gives a nice synthesis of these
earlier results and some extensions. There has also been work on the choice
of the parameters of a tiling transformation (notably the shape and size of
the tiles) so as to optimize various performance criteria which are more or
less valid depending on the context in which the tiling is performed.
Your independent study deals with the tile size optimization problem.
The most relevant results are due to Andonov and Rajopadhye [1], Hodzic and Shang [2], Ohta et al [5] (who are widely cited in spite of an error in their main
theorem: one of your tasks is to find and understand the mistake) and Hogstedt
et al [3]. In 1998, a Workshop Tiling for Optimal Resource
Utilization was held in Dagstuhl Germany, and the followup page leads to
additional references.
General References
1. F. Thomson Leighton, Parallel Algorithms and Architectures: arrays, trees,
Hypercubes. Excellent reference text (700 pages).
2. M. Cosnard and Denis Trystram, Algorithmes et architectures
parallèlesm InterEditions, 1993. General reference text covering most
aspects of parallelism.
3. P. Quinton and Y. Robert, Algorithmes et architectures systoliques,
Masson, 1989. Staring to become a bit old, but the iitial part on systolic
algorithmes is very interesting, as are the synthesis methods of Chapters 11
and 12.
4. V. Kumar, A. Grama, A. Gupta and G. Karypis, Introduction to Parallel
Computing, Benjamin Cummings, 1994. General reference text.
5. P. Feautrier, Dataflow analysis of array and scalar references, dans
"International Journal of Parallel Programming", February, 1991 (volume 20,
number 1, pages 23-53).
6 Cormen, Leiserson, Rivest, Introduction à l'algorithmique, Dunod 1994
(transa=lation of the English Algorithms (chapter 30). Another
excellent reference text for all aspects of analysis of algorithms (not just
parallel).
Tiling References
[1] Andonov and Rajopadhye, Optimal Orthogonal Tiling
of 2-D Iterations in Journal of Parallel and Distributed Computing,
September 1997 (pages 159-165)
[2] Hodzic and Shang, On Supernode Transformation with
Minimized Total Running Time in IEEE Transactions on Parallel and
Distributed Systems, May 1998 (pages 417-428)
[3] Hogstedt, Carter and Ferrante, Determining the
Idle Time of a Tiling in 1997 ACM Symposium on the Principles of
Programming Languages, January 1997
[4] Irigoin in Triolet, Supernode Partitioning in
15th ACM Symposium on Principles of Programming Languages, January 1988,
(pages 319-328)
[5] Ohta, Saito, Kainaga and Ono, Optimal Tile Size
Adjustment in Compiling General DOACROSS Loop Nests in ACM
International Conference on Supercomputing July 1995 (pages 270-279)
[6] Ramanujam and Sadayappan, Tiling Multidimensional
Iteration Spaces for Multicomputers in Journal of Parallel and
Distributed Computing, vol 16, No. 2, 1992 (pages 108-120)
[7] Schreiber and Dongarra, Automatic blocking of
nested loops Technical Report No. 90.38 RIACS, NASA Ames Research Center,
August 1990
[8] Xue, On Tiling as a Loop Transformation in
Parallel Processing Letters, vol 7, No. 4, October (?) 1997 (pages
490-424)