Irisa : thèse proposée pour la rentrée 2001

formation par la recherche / formation doctorale / enseignement, stages / sujets de thèses

Sujet de thèse proposé à l'Irisa pour la rentrée 2001-2002

Multiple cycle ahead pipelined branch predictors vs. branch predictor hierarchies

Localisation : Irisa, Rennes

Equipe(s) : Caps

Responsable : André Seznec (tél. direct : 02 99 84 73 36, email : seznec@irisa.fr)

The path for performance on superscalar processors should combine instruction level parallelism with high clock frequency. This results in very deep pipelines (20 cycles on the Intel Pentium 4 !). The penalty paid on each branch misprediction (either conditional or indirect) is huge and will continue to grow for next generation processors as they will feature more instruction level parallelism. Even a small increment in the branch prediction accuracy may lead in an effective performance benefit on such processors. Silicon area for implementing very accurate branch predictors featuring 256 Kbytes of storage or even more and complex logic will represent a very small part of the die in future generation processors. Unfortunately it is unlikely that such predictor would be able to deliver a result in less than 3 or 4 cycles.

The processor front-end has to predict the instruction flow at a tremendous rate (more than one basic block per cycle) while the prediction accuracy must not be sacrified. Usual branch predictors uses information (branch histories, fetch addresses, ..) available at a cycle to generate the next fetch addresses.

The solution that has been used so far to deal with this issue is to rely on a hierarchy of predictions, the first one performed in a single cycle followed by a second one more accurate, but performed in two or more cycles (e.g. implicit fall through + two-level branch predictor on Intel Pentium Pro, line predictor + hybrid branch predictor on Alpha 21264). The disavandtage of this method is that many fetch cycles must be cancelled, resulting in pipeline bubbles when the two predictors disagree. Such a strategy leads to the waste of a significant part of the instruction fetch bandwidth, and the loss of many cycles on hard-to-predict branches. Therefore, part of the front-end has to be oversized to allow the required throughput. It should also be noted that when complexifying the branch predictor results in an increase of its crossing latency, the extra accuracy may be counterbalanced by extra bubbles in pipeline front-end.

We propose to investigate an alternative solution that would allow to use huge and (hopefully) very accurate branch predictors without wasting bubble cycles in pipeline front-end. In [1], pipelined multiple block ahead branch prediction, we proposed to predict fetch addresses using information available 2 or more cycles ahead the fetch cycles. Such an approach allows to use very large predictor tables and some extra complex logic in the predictor, but hopefully to still achieve the prediction in time for fetch.

Bibliography
[1] A. Seznec, S.Jourdan, P. Sainrat, P. Michaud, "Multiple-Block Ahead Branch Predictors", Proceedings of the 7th conference on Architectural Support for Programming Languges and Operating Systems, Boston, October 1996

dernière mise à jour : 12.03.2001

english version

webmaster@irisa.fr

©copyright