Linköping University: Students Alumni Trade and Industry/Society Internal Search
samii_RTS2019

Practical Task Allocation for Software Fault-Tolerance and Its Implementation in Embedded Automotive Systems

Anand Bhat
 
Soheil Samii
Ragunathan Rajkumar

Real-Time Systems Journal, 2019

ABSTRACT
Due to the advent of active safety features and automated driving capabilities, the complexity of embedded computing systems within automobiles continues to increase. Such advanced driver assistance systems (ADAS) are inherently safety-critical and must tolerate failures in any subsystem. However, fault-tolerance in safety-critical systems has been traditionally supported by hardware replication, which is prohibitively expensive in terms of cost, weight, and size for the automotive market. Recent work has studied the use of software-based fault-tolerance techniques that utilize task-level hot and cold standbys to tolerate fail-stop processor and task failures. The benefit of using standbys is maximal when a task and any of its standbys obey the placement constraint of not being co-located on the same processor. We propose a new heuristic based on a “tiered” placement constraint, and show that our heuristic produces a better task assignment that saves at least one processor up to 40% of the time relative to the best known heuristic to date. We then introduce a task allocation algorithm that, for the first time to our knowledge, leverages the run-time attributes of cold standbys. Our empirical study finds that our heuristic uses no more than one additional processor in most cases relative to an optimal allocation that we construct for evaluation purposes using a creative technique. We also extend our heuristic to support mixed-criticality systems which allow for overload operation. We have designed and implemented our software fault-tolerance framework in AUTOSAR, an automotive industry standard. We use this implementation to provide an experimental evaluation of our task-level fault-tolerance features. Finally, we present an analysis of the worst-case behavior of our task recovery features.


[BSR19] Anand Bhat, Soheil Samii, Ragunathan Rajkumar, "Practical Task Allocation for Software Fault-Tolerance and Its Implementation in Embedded Automotive Systems", Real-Time Systems Journal, 2019
( ! ) perl script by Giovanni Squillero with modifications from Gert Jervan   (v3.1, p5.2, September-2002-)