Linköping University: Students Alumni Trade and Industry/Society Internal Search
sohsa65_ECRTS2018

Recovery Time Considerations in Real-Time Systems Employing Software Fault Tolerance

Anand Bhat
 
Soheil Samii
Ragunathan Rajkumar

30th Euromicro Conference on Real-Time Systems (ECRTS 2018)

ABSTRACT
Safety-critical real-time systems like modern automobiles with advanced driving-assist features must employ redundancy for crucial software tasks to tolerate permanent crash faults. This redundancy can be achieved by using techniques like active replication or the primary-backup approach. In such systems, the recovery time which is the amount of time it takes for a redundant task to take over execution on the failure of a primary task becomes a very important design parameter. The recovery time for a given task depends on various factors like task allocation, primary and redundant task priorities, system load and the scheduling policy. Each task can also have a different recovery time requirement (RTR). For example, in automobiles with automated driving features, safety-critical tasks like perception and steering control have strict RTRs, whereas such requirements are more relaxed in the case of tasks like heating control and mission planning. In this paper, we analyze the recovery time for software tasks in a real-time system employing Rate-Monotonic Scheduling (RMS). We derive bounds on the recovery times for different redundant task options and propose techniques to determine the redundant-task type for a task to satisfy its RTR. We also address the fault-tolerant task allocation problem, with the additional constraint of satisfying the RTR of each task in the system. Given that the problem of assigning tasks to processors is a well-known NP-hard bin-packing problem we propose computationally-efficient heuristics to find a feasible allocation of tasks and their redundant copies. We also apply the simulated annealing method to the fault-tolerant task allocation problem with RTR constraints and compare against our heuristics.


[BSR18] Anand Bhat, Soheil Samii, Ragunathan Rajkumar, "Recovery Time Considerations in Real-Time Systems Employing Software Fault Tolerance", 30th Euromicro Conference on Real-Time Systems (ECRTS 2018)
( ! ) perl script by Giovanni Squillero with modifications from Gert Jervan   (v3.1, p5.2, September-2002-)