Micro-Architecture

Software for simulations

The CACHESKEW simulator
The 2bcgskew simulator (for SimpleScalar)

Branch predictors packaged for the 1st Championship Branch Prediction

2bcgskew 3.80 misp/KI
MAC-RHSP 3.12 misp/KI
PPM-like tagged predictor 3.10 misp/KI
OGEHL 2.82 misp/KI
TAGE 2.55 misp/KI

Branch predictors packaged for the 2nd Championship Branch Prediction

realistic predictor: L-TAGE 3.314 misp/KI
idealistic predictor GTL 2.717 misp/KI

Researches in architecture cover

Cache architecture
Processor organisation
Sequencing and branch prediction
Simultaneous Multithreading and multicore processors

Cache Architecture

Skewed associative caches

The skewed associative cache is a new organization for multi-bank caches. Skewed-associative caches have been shown to have two major advantages over conventional set-associative caches. First, at equal associativity degrees, a skewed-associative cache typically exhibits the same hardware complexity as a set-associative cache, but exhibits lower miss ratio. This is particularly significant for BTBs and L2 caches for which a significant ratio of conflict misses occurs even on 2-way set-associative caches. Second, the behavior of skewed-associative caches is quite insensitive to the precise data placement in memory. Recently, we have shown that the skewed associative structure offers a unique opportunity to build TLBs supporting multiple page sizes.

A. Seznec, `` A case for two-way skewed-associative cache '', Proceedings of the 20th International Symposium on Computer Architecture(IEEE-ACM), San Diego, may 1993
A. Seznec, F. Bodin, ``Skewed-associative caches'', Proceedings of PARLE' 93, Munich, June 1993
A. Seznec, ``About set and skewed associativity on second level caches'', Proceedings of the International Conference on Computer Design, Boston, October 1993
F. Bodin, A. Seznec, " Skewed-associativity improves performance and enhances predictability ", IEEE Transactions on Computers, May 1997 (A short version appears in Proceedings of the 22th International Symposium on Computer Architecture (IEEE-ACM), Santa-Margharita, June 1995)
N. Drach, A. Gefflaut, P. Joubert, A. Seznec, `` About cache associativity in low-cost shared memory multi-microprocessors '', Parallel Processing Letters, Sept. 1995 (also IRISA Report No 760)
A. Seznec `` A New Case for Skewed-Associativity , 22 pages, IRISA Report No 1114, July 1997
P. Michaud, `` A Statistical Model of Skewed Associativity '', International Symposium on Performance Analysis of Systems and Software, Austin, March 6-8, 2003. slides
A. Seznec, ``Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB'', to appear in IEEE Transactions on Computers, 2003

Minimizing tag implementation costs

Most newly announced microprocessors manipulate 64-bit virtual addresses and the width of physical addresses is also growing. As a result, the relative size of the address tags in caches is increasing. We have proposed hardware solutions to limit the implementation cost of these address tags.

Related publications:

A. Seznec, `` Decoupled sectored caches: reconciliating low tag volume and low miss ratio '', Proceedings of the 21th International Symposium on Computer Architecture(IEEE-ACM), Chicago, april 1994
A. Seznec, ``Decoupled sectored caches", IEEE Transactions on Computers, Feb. 1997
A. Seznec, `` Don't use the page number, but a pointer to it '', Proceedings of the 23rd International Symposium on Computer Architecture(IEEE-ACM), May 1996

Other works on cache architecture

N. Drach, A. Seznec, ``Semi-Unified Caches'', Proceedings of the International Conference on Parallel Processing, St Charles, Illinois, August 1993 (also RR INRIA 1841)
A. Seznec, " DASC cache ", Proceedings of the First High Performance Computer Architecture(IEEE), Raleigh (USA), January 1995 (also RR INRIA 2082)
A.Seznec, F. Lloansi, `` About effective miss penalty on out-of-order microprocessor '', IRISA Report No 970 November 1995 (a slightly modified versin appears as: A. Seznec, F. Lloansi ``Performance impact of the L2 contention on out-of-order execution superscalar processors'', IEEE TCCA NEWSLETTER, March 1997)

Processor Organization

The CAPS team is working on pipeline and superscalar organization in processors. We address the complexity of instruction scheduling through prescheduling. Our work on WSRS architectures (register Write Specialization register Read Specialization) addresses the register file, bypass network and instruction scheduling complexity.

Related publications:

P. Michaud, A. Seznec, `` Data-Flow Prescheduling for Large Instructions Windows in Out-of-Order Processors '', 7th International Symposium on High Performance Computer Architecture, Monterrey, Mexico, January 19-24, 2001. slides
A. Seznec, E. Toullec, O. Rochecouste `` Register Write Specialization Register Read Specialization: A Path to Complexity Effective of Wide Issue Superscalar Processors '', Slides (Powerpoint) , Proceedings of the 35th International Symposium on Microarchitecture (ACM-IEEE), Istambul, November 2002

Sequencing and branch prediction

Multiple-block ahead branch prediction

Multiple-block ahead branch prediction is a new branch prediction mechanism. This mechanism provides an efficient way to predict the addresses of two instruction blocks in a single cycle. Such an approach would be very useful for wide dispatch superscalar processors. Recently, we have explored in details the effective design of a complete instruction fetch mechanism.

Related publications:

A. Seznec, S.Jourdan, P. Sainrat, P. Michaud, `` Multiple-Block Ahead Branch Predictors '', Proceedings of the 7th conference on Architectural Support for Programming Languges and Operating Systems, Boston, October 1996
P. Michaud, A. Seznec, S. Jourdan, P. Sainrat Alternative Schemes for High-Bandwidth Instruction Fetching , IRISA Report No 1180, March 1998
P. Michaud, A. Seznec, S. Jourdan Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors , IRISA Report No 1227, Feb. 1999
A. Seznec, A. Fraboulet, "Effective ahead pipelining of instruction block address generation" , Proceedings of the 30th International Symposium on Computer Architecture (IEEE-ACM), San Diego, june 2003

Skewed branch predictors

Between 1996 and 2000, we have investigated the use of the majority vote as a mean to avoid aliasing impact on global history branch predictors. The 2bcgskew predictor was the basis of the branch predictor of cancelled Alpha EV8. 2bcgskew is often considered in the literature as the most efficient conventional predictor (2-bit counter based) as opposed to neural predictors. Its accuracy is also often underestimated in comparative studies. The parameters described in ``An optimized 2bcgskew branch predictor " should be used in comparative studies.

Related publications:

P. Michaud, A. Seznec, R. Uhlig, `` Trading conflict and capacity aliasing in conditional branch predictors '', in Proceeedings of the 24th International Symposium on Computer Architecture, Denver 2-4 June 1997.
A. Seznec, P. Michaud Dealiased Hybrid Branch Predictors , IRISA Report No 1229, Feb. 1999
A. Seznec, S. Felix, V. Krishnan, Y. Sazeides , " Design trade-offs on the EV8 branch predictor ", Slides (Powerpoint) , Proceedings of the 29th International Symposium on Computer Architecture (IEEE-ACM), Anchorage, may 2002
A. Seznec, ``An optimized 2bcgskew branch predictor ', september 2003

Understanding global history branch predictors

P. Michaud, A. Seznec, A comprehensive study of dynamic global history branch prediction , IRISA Report No 1406,

Improving the perceptron branch predictor

In 2003-2004, we have begun to explore the potential of the use of the perceptron predictor. Our preliminary work showed that this potential was largely underestimated in the pioneer work from Jiménez and Lin. Then we proposed The MAC-RHSP (Multiply Accumulate Contribution Redundant History Skewed Perceptron) predictor with lower hardware complexity than the perceptron predictor, but much better prediction accuracy.

Related publications:

A. Seznec, Redundant History Skewed Perceptron Predictors: pushing limits on global history branch predictors , IRISA Report No 1554, sept. 2003
A. Seznec, Revisiting the perceptron predictor , IRISA Report No 1620, May 2004

Pushing limits on global history predictors

In 2004-2005, we have proposed new global history predictors for exploiting very long global history, in the hundred bits range. The OGEHL and the PPM-like predictor were selected for the 1st CBP contest. They both feature a limited number of tables and exploit very long global histories. The PPM-like predictor uses partial tag matching as the prediction selection function while the OGEHL predictor uses a tree adder. OGEHL uses a geometric series of history lengths. TAGE mixes the partial tag matching (and an optimized update policy) with usage of geometric series of history lengths. TAGE won the 2nd CBP contest in the realistic predictors category in 2006.

For exploring the limits of branch prediction, the GTL predictor was defined. GTL essentially combines a TAGE predictor and an OGEHL predictor. GTL won the 2nd CBP contest in the idealistic predictors category in 2006.

Related publications:

A. Seznec, ``Genesis of the OGEHL predictor", Journal of Instruction Level Parallelism , April 2005
P. Michaud,``A PPM-like tag-based predictor", Journal of Instruction Level Parallelism , April 2005
A. Seznec, ``Analysis of the OGEHL predictor", Proceedings of the 32th International Symposium on Computer Architecture (IEEE-ACM), Madison, june 2005
A. Seznec, P. Michaud, `` A case for (partially) tagged Geometric History Length Branch Prediction", Journal of Instruction Level Parallelism , Feb. 2006
A. Seznec ``Looking for limits in branch prediction with the GTL predictor'', ppt presentation, CBP-2, December 2006
A. Seznec ``A 256 Kbits L-TAGE predictor'', ppt presentation, CBP-2, December 2006
A. Seznec "The L-TAGE predictor", Journal of Instruction Level Parallelism, May 2007
A. Seznec "The idealistic GTL predictor", Journal of Instruction Level Parallelism, May 2007

Simultaneous multithreading and multicore processors

Simultaneous multithreading

Simultaneous multithreading (SMT) is an interesting way of maximizing performance by enhancing processor utilization. We have investigated various issues involving the behavior of the memory hierarchy with SMT: branch prediction, memory hierarchy behavior, out-of-order and in-order executions,.. SMT has shown to be quite complex to implement (e.g. Alpha EV8). Recently, we have been exploring an intermediate design point between SMT and CMP, the CASH architecture (for Cmp And Smt Hybrid).

Related publications:

S. Hily, A. Seznec `` Branch prediction and simultaneous multithreading '', 25 pages, IRISA Report No 997, March 1996. appears as in proceedings of PACT'96, Boston, october 1996.
S. Hily, A. Seznec `` Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading, Proceedings of MTEAC'98 Workshop, Feb. 1998 , a longer version is available as Contention on 2nd Level Cache May Limit The Effectiveness of Simultaneous Multithreading , 22 pages, IRISA Report No 1086, Feb. 1997
S. Hily, A. Seznec `` Out-Of-Order Execution May Not Be Cost-Effective on Processors Featuring Simultaneous Multithreading '', IRISA Report No 1179, March 1998, short version appears in proceedings of HPCA-5, Orlando, Jan. 1999.
K. Luo, M. Franklin, S. Mukherjee, A. Seznec, ``Boosting SMT Performance by Speculation Control'', Proceedings of International Parallel and Distributed Processing Symposium, april 2001
R. Dolbeau, A. Seznec, " CASH: revisiting hardware sharing in single-chip parallel processor ", IRISA Report, November 2002

Multicore processors

P. Michaud, "Exploiting the cache capacity of a single-chip multicore processor with execution migration", 10th International Symposium on High Performance Computer Architecture, Madrid, Spain, February 14-18, 2004