Research Projects > CAPS > Mature Projects

The calvin2+DICE and LiKE toolset for
Microarchitecture Simulations

Project contacts: Thierry Lafage, André Seznec

Introduction

Realistic microarchitecture simulations require realistic inputs from a feeder (either a trace collection tool or an emulator). Realistic inputs consists of the whole user and operating system activity of a realistic target workload. However, the feeder by itself induces a significant execution overhead, particularly when running instructions which are not simulated (e.g. initialization phase). Therefore, microarchitecture studies are generally performed using the beginning of the application (or after skipping a few hundred millions of instructions).

The main goal of the calvin2+DICE and LiKE toolset is to enable microarchitecture simulations of long applications by providing a very fast feeder able to skip billions of instructions with a very limited overhead compared with the original application. Ideally, this feeder would also be able to catch the activity of the operating system.

General Approach

To trace programs or to perform on-the-fly simulations, static code annotation is generally a more efficient technique than instruction-set emulation. However, instruction-set emulation is generally a much more flexible approach: 1) it makes it possible to implement different tracing/simulation strategies without the need to (re)instrument the target programs and 2) all user activity (including dynamically linked code and dynamically compiled code) can be traced and simulated. Our approach takes advantage of both static code annotation for its efficiency and instruction-set emulation for its flexibility.

A fast mode of execution is used to rapidly (i.e. with a very low overhead) position the target program in interesting simulation states. This mode relies on a direct execution of a lightly instrumented version of the program on the host processor. On the other hand, an instruction-set emulator is used to actually trace the target program or enable on-the-fly simulations (emulation mode). This emulator is embedded in the target program and can take the control during the execution in fast mode.

At run time, the target program switches from the fast mode to the emulation mode whenever switching events happen. Switching events are monitored by the statically added code in the target program. This code only tests whether a switching event has occured, and on a switching event gives the control to the emulator. Note that, mode switching is made deterministic since the annotation code drives it. Switching back from emulation mode to fast mode is managed by the emulator and is possible at any moment.

Light Static Code Annotation with calvin2

calvin2 is a static code annotation tool which uses the SALTO library to instrument SPARC assembly code.

calvin2 lightly instruments the target programs by inserting checkpoints: the fast execution mode is the direct execution of the instrumented programs. The checkpoint code sequence consists in a few instructions (about 10) which checks whether the control has to be given to DICE, the emulator. Switching from the fast mode to the emulation mode is triggered by a switching event.

Checkpoint Insertion

The number of inserted checkpoints directly determines the execution overhead in fast mode. So checkpoints must not be too numerous. In contrast, their number and distribution among the code executed determines the dynamic accuracy of mode switching (fast mode to emulation mode). For instance if the checkpoint distribution was uniform every 500 dynamic instructions, we could choose to enter emulation mode at the Nth instruction +/- 250 (ideal accuracy) at a cost of only around 2% performance (10 instructions added each 500 instructions). Unfortunately, checkpoints have to be inserted at code generation.

For this reason, we have run experiments on the SPEC95 benchmarks to characterize the distribution of the checkpoints executed when they are inserted at procedure calls and inside each path of loops. These experiments also allowed us to estimate the execution slowdown in fast mode.

In a word, for all the programs but fpppp, given a dynamic switching event, there is a very good probability (90+%) that the execution mode switch actually happen within less than 100-200 instructions. Also, such a checkpoint layout make us expect low execution slowdowns: 1.56 max. Inserting checkpoints at procedure calls and inside each path of loops is quite acceptable.

Switching Events

We call switching event, the event that, during the execution in fast mode, makes the next executed checkpoint pass control to DICE. Switching back from the emulation mode to the fast mode is determined by the simulation user (e.g. a given number of instructions emulated or some point reached in the execution). Four different types of switching event have been implemented so far (see here for more details).

DICE: A Dynamic Inner Code Emulator

DICE emulates SPARC V9 instruction-set architecture (ISA) code: it manages the emulation execution mode of target programs. DICE is a piece of C and assembly code (archive library) which is embedded in (linked with) the target application. As such, it can receive the control, and return to direct execution at any moment during the execution by saving/restoring the host processor state. DICE works with programs instrumented by calvin2: the inserted checkpoints are used to give control to it.

DICE enables simulation by calling user-defined analysis routines between each instruction emulated. Analysis routines have direct access to all information in the target program state, including complete memory state, and register values.

Emulation Core

**Figure 1:** DICE main processing loop.
$\includegraphics[width=0.7\linewidth]{main_loop.eps}$

DICE emulation core is made of the traditional fetch-decode-interpret loop shown in Fig 1. Each instruction is taken in the program text segment, decoded and interpreted. Trace collection or on-the-fly simulation is allowed by calling specific user-provided routines at each iteration of the main emulator loop.

Processor Model

DICE can emulate SPARC V9 ISA code and models the architectural resources of an UltraSPARC processor. These resources are used to keep in memory the state of the target program. They are made of a memory copy of all the SPARC V9 non-privileged registers: general-purpose registers, floating-point registers, and control register (PC, nPC, ...).

User Interface

DICE provides an interface which allows users to access dynamic information in order to trace/simulate the target programs. Various levels of detail are available and are configured at compilation time by defining (or not) preprocessing macros.

DICE user interface is implemented through global variables and a few function declarations. The functions (user analysis routines) are to be defined by the user, and are called by DICE under well defined circumstances (before/after each instruction emulation, at system calls, and at checkpoints). Also, depending on DICE configuration, some of these functions may or may not be called.

Each user analysis routine can access the host processor logical resources (general-purpose registers, floating-point registers and control registers) through the parameters passed to it or directly through the memory model of the host processor.

More details about DICE internals are presented here.

Performance Evaluation of calvin2+DICE

Programs instrumented with calvin2 and linked with DICE have two modes of execution: the fast mode and the emulation mode. In order to evaluate execution slowdowns incurred by both execution modes, we collected execution times of the SPEC95 benchmarks, running entirely either in fast mode, or in emulation mode.

On average (upon all the SPEC95 benchmarks), the fast mode execution slowdown range from 1.07 to 1.82 depending on the switching event type. The average emulation mode slowdown for instruction and data address trace generation (to /dev/null) in emulation mode is 117.33.

LiKE: The Linux Kernel Emulator

DICE has been extended to LiKE (Linux Kernel Emulator) in order to trace/simulate the operating system level activity. This extension allowed us to complete our emulator: we modeled the privileged processor resources (privileged registers), we added a support for the privileged instructions, and we made it work in a true 64 bits environment since we ran it on an UltraSPARC I workstation running Linux 2.2.8.

LiKE is a dynamically loadable module and can be incorporated into a very slightly patched Linux kernel at any moment after kernel boot. This feature is important because it lowers the degree of intrusiveness of our tool: when the LiKE module is loaded, the kernel is in a realistic state and LiKE does not disturb it since it is only added to it. Also, the kernel can boot and be used at full speed.

The current implementation use an external shared variable to drive LiKE. When this variable is set and when the current process is traced by DICE, LiKE take the control of the system calls made by the current process.

However, LiKE is a preliminary version: it turned out to manage to emulate the core of some system calls. This tool needs further development to enable complete on-the-fly simulations. We plan to set up a shared memory space (shared between the traced OS and the traced processes) where the state of the on-the-fly simulator would be updated either by a module connected to LiKE (when the kernel has control in emulation mode), either by a library linked with DICE (when one of the processes have the control and is in emulation mode).

last update: 11 07 2000
	pas de version française		Pierre.Michaud@irisa.fr		©copyright