|
A compiler/simulator suite for cryptography ASIP
IATO, the IAOO Toolkit is a flexible environment that permits
to analyze, emulate or simulate the IA64 Instruction Set Architecture (ISA)
binary executables.
Out-of-order execution on IA64 microarchitectures
We are investigating a novel register management policy that is designed
to operate smoothly with a fully predicated ISA. This new system is based
on an intermediate representation called Translation Register Buffer (TRB).
The TRB mechanism that translates a logical register into a physical register
is shown to be effective when an instruction is canceled by a predicate.
Related publications:
Decoupled Architectures
Needs for performance on embedded applications will lead to the use of dynamic
execution on embedded processors in the next few years. However, complete
out-of-order superscalar cores are still expensive in terms of silicon area
and power dissipation. Decoupled architectures provide a more limited form
of dynamic execution, yet simpler to implement. We have studied the adequation
of decoupled architectures to embedded applications.
Related publications:
Power / Performance Tradeoffs
Power consumption is becoming a major issue on most processors. We are exploring
the impact of compiler optimizations on power consumption. We have
shown that there exists a threshold above which ILP enhancing optimizations
may necessarily turn into diminishing energy reduction returns. Our analysis
revealed that this can be mainly attributed to the limited available instruction
parallelism of applications.
We are also exploring the use of reconfigurable hardware to decrease power
consumption without impacting performance. The cache hierarchy is a typical
example of such a power/performance tradeoff. On some processors, the cache
accounts for up to 50% of the total chip area and for about 80% of the total
transistor count, making the cache hierarchy a critical source of power dissipation.
One way to tackle this problem is to have reconfigurable caches which size
and associativity can adapt to the workload characteristics. We are exploring
fine-grain reconfiguration strategies that try to identify phases during
the program execution and reconfigure the cache on a per-phase basis.
Related publications:
- G. Pokam, F. Bodin. Exploring the energy-efficiency
potential of a phase-based cache resizing scheme for embedded systems.
In Proceedings of the 8th Annual Worskhop on Interaction between Compilers
and Computer Architectures (INTERACT-8), Madrid, Spain, February 2004. To
appear.
- G. Pokam, F. Bodin. Energy reduction potential of
a phase-based cache resizing scheme for embedded systems. INRIA Research Report No 5036,
December 2003.
- G. Pokam, F. Bodin. Energy-delay tradeoff analysis
of ILP-based compilation techniques on a VLIW architecture. INRIA Research Report No 5026,
November 2003.
Exploitation of special-purpose instruction sets in C programs
Many of modern processors provide extensions to their instruction set specifically
designed for computation-intensive multimedia applications. These multimedia
extensions are usually provided as intrinsics that can be inserted in C code.
Direct insertion in the assembly code is possible but requires good knowledge
of both processor architecture and compilation techniques. Moreover such
an approach does not lead to portable codes. Using intrinsics in C source
code still requires code transformations such as vectorization for highlighting
code regions where data parallelism can be exploited. We have developed a
C-to-C retargetable preprocessor called SWARP that searches for portions
of code suitable to the use of multimedia instructions and automatically
inserts their intrinsic equivalent. SWARP is based on modern code analysis
and code transformation (dependence analysis, alias analysis, loop transformation,
vectorization,...) and on pattern matching for recognizing and replacing
suitable code patterns.
Related publications:
- G. Pokam, S. Bihan, J. Simonnet, and F. Bodin. SWARP: A
retargetable preprocessor for multimedia instructions. Concurrency and Computation:
Practice and Experience, Volume 16, Issue 2-3, p 303 - 318, February - March
2004. (ps)
- G. Pokam, J. Simonnet and F. Bodin. A retargetable
preprocessor for multimedia instructions. In Proceedings of the 9th Workshop
on Compilers for Parallel Computers (CPC 2001), Edinburgh, Scotland UK, June
2001. (ps)
High speed instruction-set simulation
Instruction-set simulation can be used to evaluate different instruction-set
architectures in the context of architecture exploration, or to validate
a compiler back-end, to test, tune and debug programs, on a user-friendly
PC or workstation rather than on actual silicon which might not even exist
yet. The increasing size and complexity of embedded software require extremely
fast instruction-set simulation. Compiled instruction-set simulation is an
approach that is potentially much faster than interpretation, but it has
a start-up cost due to the generation and compilation of the simulator. This
start-up cost is often seen as a major drawback and has limited the adoption
of compiled instruction-set simulation. We have designed a new approach to
compiled instruction-set simulation, that aims at reconciling flexibility,
retargetability, high simulation speed, and small start-up cost. This approach
was implemented in ABSCISS, a generator of compiled instruction-set simulators
that works at the assembler level.
Related publications:
- R. Amicel, F. Bodin. "Mastering startup costs in assembler
based compiled instruction set simulation". Proceedings of the 6th Annual
Workshop on Interaction between Compilers and Computer Architectures (INTERACT-6).
Cambridge, États-Unis, février 2002.
- R. Amicel, F. Bodin. "A new system for high performance
cycle accurate compiled simulation". Proceedings of the 5th International
Workshop on Software and Compilers for Embedded Systems (SCOPES). St. Goar,
Allemagne, mars 2001.
|