The skewed associative cache is a new organization for multi-bank
caches. Skewed-associative caches have been shown to have two major
advantages over conventional set-associative caches. First, at equal
associativity degrees, a skewed-associative cache typically exhibits the
same hardware complexity as a set-associative cache, but exhibits lower
miss ratio. This is particularly significant for BTBs and L2 caches for
which a significant ratio of conflict misses occurs even on 2-way
set-associative caches. Second, the behavior of skewed-associative
caches is quite insensitive to the precise data placement in memory.
Recently, we have shown that the skewed associative structure offers a
unique opportunity to build TLBs supporting multiple page sizes.
A. Seznec, F. Bodin, ``Skewed-associative caches'', Proceedings
of PARLE' 93, Munich, June 1993
A. Seznec, ``About set and skewed associativity on second level
caches'', Proceedings of the International Conference on Computer
Design, Boston, October 1993
Most newly announced microprocessors manipulate 64-bit virtual
addresses and the width of physical addresses is also growing. As a
result, the relative size of the address tags in caches is increasing.
We have proposed hardware solutions to limit the implementation cost of
these address tags.
N. Drach, A. Seznec, ``Semi-Unified Caches'', Proceedings of the
International Conference on Parallel Processing, St Charles, Illinois,
August 1993 (also RR INRIA 1841)
A. Seznec, "
DASC cache ", Proceedings of the First High Performance Computer
Architecture(IEEE), Raleigh (USA), January 1995 (also RR INRIA 2082)
A.Seznec, F. Lloansi, `` About
effective miss penalty on out-of-order microprocessor '', IRISA
Report No 970 November 1995 (a slightly modified versin appears as: A.
Seznec, F. Lloansi ``Performance impact of the L2 contention on
out-of-order execution superscalar processors'', IEEE TCCA NEWSLETTER,
March 1997)
Processor Organization
The CAPS team is working on pipeline and superscalar organization
in processors. We address the complexity of instruction scheduling
through prescheduling. Our work on WSRS architectures (register Write
Specialization register Read Specialization) addresses the register
file, bypass network and instruction scheduling complexity.
Multiple-block ahead branch prediction is a new branch prediction
mechanism. This mechanism provides an efficient way to predict the
addresses of two instruction blocks in a single cycle. Such an approach
would be very useful for wide dispatch superscalar processors. Recently,
we have explored in details the effective design of a complete
instruction fetch mechanism.
Related publications:
A. Seznec, S.Jourdan, P. Sainrat, P. Michaud, `` Multiple-Block
Ahead Branch Predictors '', Proceedings of the 7th conference on
Architectural Support for Programming Languges and Operating Systems,
Boston, October 1996
Between 1996 and 2000, we have investigated the use of the majority
vote as a mean to avoid aliasing impact on global history branch
predictors. The 2bcgskew predictor was the basis of the branch predictor
of cancelled Alpha EV8. 2bcgskew is often considered in the
literature as the most efficient conventional predictor (2-bit counter
based) as opposed to neural predictors. Its accuracy is also often
underestimated in comparative studies. The parameters described in ``An
optimized 2bcgskew branch predictor " should be used in comparative
studies.
In 2003-2004, we have begun to explore the potential of the use of the
perceptron predictor. Our preliminary work showed that this potential
was largely underestimated in the pioneer work from Jiménez and
Lin. Then we proposed The MAC-RHSP (Multiply Accumulate Contribution
Redundant History Skewed Perceptron) predictor with lower hardware
complexity than the perceptron predictor, but much better prediction
accuracy.
In 2004-2005, we have proposed new global history predictors for
exploiting very long global history, in the hundred bits range. The
OGEHL and the PPM-like predictor were selected for the 1st CBP contest.
They both feature a limited number of tables and exploit very long
global histories. The PPM-like predictor uses partial tag matching as
the prediction selection function while the OGEHL predictor uses a tree
adder. OGEHL uses a geometric series of history lengths. TAGE mixes the
partial tag matching (and an optimized update policy) with usage of
geometric series of history lengths. TAGE won the 2nd CBP contest in
the realistic predictors category in 2006.
For exploring the limits of branch prediction, the GTL predictor
was defined. GTL essentially combines a TAGE predictor and an OGEHL
predictor. GTL won the 2nd
CBP contest in the idealistic predictors
category in 2006.
Simultaneous multithreading and multicore
processors
Simultaneous multithreading
Simultaneous multithreading (SMT) is an interesting way of maximizing
performance by enhancing processor utilization. We have investigated
various issues involving the behavior of the memory hierarchy with SMT:
branch prediction, memory hierarchy behavior, out-of-order and in-order
executions,.. SMT has shown to be quite complex to implement (e.g. Alpha
EV8). Recently, we have been exploring an intermediate design point
between SMT and CMP, the CASH architecture (for Cmp And Smt Hybrid).
K. Luo, M. Franklin, S. Mukherjee, A. Seznec, ``Boosting SMT
Performance by Speculation Control'', Proceedings of International
Parallel and Distributed Processing Symposium, april 2001