Architecture/Compilation
A compiler/simulator suite for cryptography ASIP
IATO, the IAOO Toolkit is a flexible environment that permits to analyze, emulate or simulate the IA64 Instruction Set Architecture (ISA) binary executables.
Out-of-order execution on IA64 microarchitectures
We are investigating a novel register management policy that is designed to operate smoothly with a fully predicated ISA. This new system is based on an intermediate representation called Translation Register Buffer (TRB). The TRB mechanism that translates a logical register into a physical register is shown to be effective when an instruction is canceled by a predicate.
Related publications:
Decoupled Architectures
Needs for performance on embedded applications will lead to the use of dynamic execution on embedded processors in the next few years. However, complete out-of-order superscalar cores are still expensive in terms of silicon area and power dissipation. Decoupled architectures provide a more limited form of dynamic execution, yet simpler to implement. We have studied the adequation of decoupled architectures to embedded applications.
Related publications:
Power / Performance Tradeoffs
Power consumption is becoming a major issue on most processors. We are exploring the impact of compiler optimizations on power consumption. We have shown that there exists a threshold above which ILP enhancing optimizations may necessarily turn into diminishing energy reduction returns. Our analysis revealed that this can be mainly attributed to the limited available instruction parallelism of applications.
We are also exploring the use of reconfigurable hardware to decrease power consumption without impacting performance. The cache hierarchy is a typical example of such a power/performance tradeoff. On some processors, the cache accounts for up to 50% of the total chip area and for about 80% of the total transistor count, making the cache hierarchy a critical source of power dissipation. One way to tackle this problem is to have reconfigurable caches which size and associativity can adapt to the workload characteristics. We are exploring fine-grain reconfiguration strategies that try to identify phases during the program execution and reconfigure the cache on a per-phase basis.
Related publications:
- G. Pokam, F. Bodin. Exploring the energy-efficiency potential of a phase-based cache resizing scheme for embedded systems. In Proceedings of the 8th Annual Worskhop on Interaction between Compilers and Computer Architectures (INTERACT-8), Madrid, Spain, February 2004. To appear.
- G. Pokam, F. Bodin. Energy reduction potential of a phase-based cache resizing scheme for embedded systems. INRIA Research Report No 5036, December 2003.
- G. Pokam, F. Bodin. Energy-delay tradeoff analysis of ILP-based compilation techniques on a VLIW architecture. INRIA Research Report No 5026, November 2003.
Exploitation of special-purpose instruction sets in C programs
Many of modern processors provide extensions to their instruction set specifically designed for computation-intensive multimedia applications. These multimedia extensions are usually provided as intrinsics that can be inserted in C code. Direct insertion in the assembly code is possible but requires good knowledge of both processor architecture and compilation techniques. Moreover such an approach does not lead to portable codes. Using intrinsics in C source code still requires code transformations such as vectorization for highlighting code regions where data parallelism can be exploited. We have developed a C-to-C retargetable preprocessor called SWARP that searches for portions of code suitable to the use of multimedia instructions and automatically inserts their intrinsic equivalent. SWARP is based on modern code analysis and code transformation (dependence analysis, alias analysis, loop transformation, vectorization,...) and on pattern matching for recognizing and replacing suitable code patterns.
Related publications:
- G. Pokam, S. Bihan, J. Simonnet, and F. Bodin. SWARP: A retargetable preprocessor for multimedia instructions. Concurrency and Computation: Practice and Experience, Volume 16, Issue 2-3, p 303 - 318, February - March 2004. (ps)
- G. Pokam, J. Simonnet and F. Bodin. A retargetable preprocessor for multimedia instructions. In Proceedings of the 9th Workshop on Compilers for Parallel Computers (CPC 2001), Edinburgh, Scotland UK, June 2001. (ps)
High speed instruction-set simulation
Instruction-set simulation can be used to evaluate different instruction-set architectures in the context of architecture exploration, or to validate a compiler back-end, to test, tune and debug programs, on a user-friendly PC or workstation rather than on actual silicon which might not even exist yet. The increasing size and complexity of embedded software require extremely fast instruction-set simulation. Compiled instruction-set simulation is an approach that is potentially much faster than interpretation, but it has a start-up cost due to the generation and compilation of the simulator. This start-up cost is often seen as a major drawback and has limited the adoption of compiled instruction-set simulation. We have designed a new approach to compiled instruction-set simulation, that aims at reconciling flexibility, retargetability, high simulation speed, and small start-up cost. This approach was implemented in ABSCISS, a generator of compiled instruction-set simulators that works at the assembler level.
Related publications:
- R. Amicel, F. Bodin. "Mastering startup costs in assembler based compiled instruction set simulation". Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT-6). Cambridge, États-Unis, février 2002.
- R. Amicel, F. Bodin. "A new system for high performance cycle accurate compiled simulation". Proceedings of the 5th International Workshop on Software and Compilers for Embedded Systems (SCOPES). St. Goar, Allemagne, mars 2001.