Scheduling and Optimization of Fault-Tolerant Embedded Systems
Licentiate Thesis No. 1277, Dept. of Computer and Information Science, Linköping University, November 2006
Safety-critical applications have to function correctly even in presence of faults. This thesis deals with techniques for tolerating effects of transient and intermittent faults. Reexecution, software replication, and rollback recovery with checkpointing are used to provide the required level of fault tolerance. These techniques are considered in the context of distributed real-time systems with non-preemptive static cyclic scheduling.
Safety-critical applications have strict time and cost constrains, which means that not only faults have to be tolerated but also the constraints should be satisfied. Hence, efficient system design approaches with consideration of fault tolerance are required.
The thesis proposes several design optimization strategies and scheduling techniques that take fault tolerance into account. The design optimization tasks addressed include, among others, process mapping, fault tolerance policy assignment, and checkpoint distribution.
Dedicated scheduling techniques and mapping optimization strategies are also proposed to handle customized transparency requirements associated with processes and messages. By providing fault containment, transparency can, potentially, improve testability and debugability of fault-tolerant applications.
The efficiency of the proposed scheduling techniques and design optimization strategies is evaluated with extensive experiments conducted on a number of synthetic applications and a real-life example. The experimental results show that considering fault tolerance during system-level design optimization is essential when designing cost-effective fault-tolerant embedded systems.
[I06] Viacheslav Izosimov, "Scheduling and Optimization of Fault-Tolerant Embedded Systems", Licentiate Thesis No. 1277, Dept. of Computer and Information Science, Linköping University, November 2006