Hades (Highly Available Distributed Embedded System) is
an environment for the design, timing analysis and execution
of distributed hard real-time dependable applications.
Hades provides a set of tools, executed off-line, to develop
applications, modeled thanks to annotated distributed Directed
Acyclic Graphs (DAGs) and verify their timing correctness.
More precisely, the off-line tools encompass feasibility
tests associated with a panel of scheduling policies, a
fault-tolerance tool, and a tool that determines the right
amount of memory to be allocated for the application execution.
The fault-tolerance tool changes the structure of application
tasks to make them fault tolerant; it takes into account
the requirements of the application designer (e.g., tasks
or the portions of tasks to be replicated, fault-tolerance
strategy to be used such as active, passive or semi-active
replication).
Application execution is managed by the Hades run-time
support. This run-time support is built as a middleware
software layer running on top of off-the-shelf real-time
kernels (it has been ported on Chorus and RTEMS and emulated
on Solaris). It consists in a set of services mandatory
for the execution of a large panel of dependable distributed
real-time tasks. Examples of such services are scheduling
policies, communication primitives, group membership management,
distributed execution control, and clock synchronization.
All these services exhibit timeliness and dependability
properties. The Hades run-time support has been designed
to guarantee three fundamental and complementary aspects:
Real-time: support of applications that exhibit
strict timing constraints. The achievement of the real-time
aspect mainly relies on an accurate estimation of the
worst-case execution times (WCETs) for all the activities
to be executed in the system (application tasks, run-time
support tasks, real-time kernel).
Fault-tolerance: achievement of a high degree
of reliability thanks to fault-tolerance mechanisms which
are transparent to the application designer. The fault-tolerance
aspect of the run-time support relies on the fault-tolerance
tool, and on fault-detection and exception handling mechanisms
offered by the run-time support.
An important feature of Hades is that reliability and hard
real-time aare studied as a whole.
Flexibility: Parts of the the run-time support
without having to rewrite it entirely. As a consequence,
the ruun-time support can be tailored to the specific
constraints of the applications.
In the remainder, we present Hades through the design and
the execution of a toy application called bees. This
application manages the pursuit of a wasp by several bees.
Among the bees, we distinguish those who are critical (thos
that must survive a machine crash) from those who are not-critical.
The Hades platform for this demonstration is made of four
computers. The wasp is moved by computer Hades 0.
Critical bees are moved by Hades 1 and Hades 3
in order to treat failure of one of these two computers.
Finally, non-critical bees are moved by Hades 2.
|