Funded Projects
 


A Cross-layer Approach to Reliability Optimization for

Automotive Electronic Systems


CENIIT Project, Leader: Unmesh D. Bordoloi

                  

1 Project Description

 

1.1 Motivation 

As fabrication technologies continue scaling towards smaller components, the rate of errors in devices has increased significantly [8, 1]. We can no longer assume that electronic devices are perfect because errors at the hardware-level can now be visible at the application level (visible to the user), leading to unreliable performance [1, 9]. On the other hand, embedded electronic systems are pervading critical aspects of our lives with health care devices and high-end control systems in automobiles. Such systems must be designed in order to guarantee the reliability expected by the users at the application-level inspite of the rising levels of failure rates at the underlying hardware-layer. This necessitates a cross-layer approach to achieve the overall system reliability. The motivation for advocating a cross-layer approach to address system reliability is elaborated further in the following.

 

State of the Art: Today’s electronic systems consist of a stack of layers starting from the device upto the application-level. Current design techniques that focus on achieving system reliability usually concentrate on a single layer [2] of the system stack. Unfortunately, by relying on a single layer, traditional approaches impose high overheads in power, performance, and chip area. For example, consider a system where, by building hardened (i.e., reliable) circuits [10], the underlying hardware architecture is uniformly hardened. This implies that all the instructions running on such an architecture would be of the same reliability level. However, depending on the application, some of these instructions might not invoked at the application-level. By concentrating only on the hardware-level reliability, this approach incurs unnecessary costs on redundant hardware. On the other hand, by considering the only those instructions that are accessed by the application, it would have been possible to selectively harden the hardware components and optimize the costs. Thus, a cross-layer approach that explicitly handles the tradeoffs across both the hardware and application levels would have been more cost-effective. Another scenario is where the hardware (at circuit and architecture levels) is of low cost and is susceptible to high error rates. This unreliability can be compensated for, at software-level, by time redundancy (e.g., via re-execution of tasks and re-transmission of messages [7]) but again this comes at the price of long worst case delays leading to poor performance. To summarize, in the context of current and future embedded systems— which are power hungry and impose ever increasing demands on performance — designing reliable and efficient embedded systems cannot be achieved by isolating the problem to a single level of the system stack.

                       


1.2 Goal and Vision

In contrast to the traditional single layer approaches described above, the goal of this project is to devise cross-layer optimization approaches to reliability. Thus, our aim is devise solutions to obtain the desired level of fault tolerance at affordable cost and without violating performance constraints (by avoiding long worst case delays). We want to strike the right trade-off between the reliability levels on the hardware side (from circuits up to hardware architecture) and combine it with the necessary amount of time redundancy at software-level. The tradeoffs have to be computed based on novel system-failure probability analysis technique so that the levels of redundancy in software can be connected to the reliability levels in hardware (hardening levels). Thus, the cross-layer optimization approaches to achieve fault-tolerance is, in some sense, the system-level design optimization that “distributes” the global task of achieving the required level of fault tolerance between the different layers of the system stack (see Figure 1). Note that the time redundancy at software-level may be in turn distributed between the instruction-level (operating system) and the application-level. In particular, we envisage that instruction selection/configuration techniques may be devised to achieve the system reliability. Our techniques to select the relevant instructions according to the target application will be particularly suitable in the context of application specific instruction-set processors (ASIPs) [6]. Our proposed instruction set customization techniques for fault tolerance may also be intertwined with hardware reliability techniques. This arises in the scenario of ASIPs where some instructions are more reliable compared to others on account of their implementations on hardened hardware. The reliability problem becomes more interesting with dynamically reconfigurable architectures. This allows us to conduct instruction configuration not only statically but also at run time. At run time, considering information regarding current fault rate in a device, instructions can be reconfigured. Statically prepared code can be run on the new instruction set or recompilation can be done at run-time. Above, our discussion focused around computation on the processor, however, the proposed ideas hold true for the communication systems as well. Similar to the time redundancy provided by software re-executions, there are also techniques that strive to enhance reliability by re-transmission of messages over communication channels [5]. On the other side, some communication protocols [4] focus on providing extra transmission channels but this leads to extra cost in silicon area. By devising techniques that have the right tradeoff between cost and time redundancy, our goal is to render such protocols affordable for next generation ultra-reliable devices.

 

 

Research Contributions: In the long term, the vision of this project is to address the challenges outline above theoretically as well as to buttress our theories with experiments. We also aim to study industrial case-studies to conduct our experiments. To summarize, we anticipate achieving the following results.

 

• Holistic mechanism that may capture the information flow (regarding reliability and error rates) across multiple levels of abstraction

• System-failure probability analysis technique that would connect the software redundancy to the hardware reliability levels

• Efficient solutions that expose reliability-performance-cost tradeoffs at multiple levels

• Instruction set customization methods that are intertwined with hardware reliability in order to achieve system reliability

• Run-time error recovery mechanism for dynamically reconfigurable architectures based on online error rate monitoring

• Cross-layer optimizations that encompass both computation and communication resources

 

 

2 Industrial Relevance and Collaboration

This project is directly relevant to the electronics industry, in particular, to the automotive industry. The past decade has seen a proliferation in the number of electronic devices in a car. Thus, from an electronic perspective, today’s cars are essentially a distributed embedded system where many processors are communicating via field buses. As more transistor sizes are scaled aggressively, automotive electronic components are increasingly susceptible to faults [3, 14]. Such faults might occur in the communication system (i.e., the fieldbus) or in the computation system (i.e., the processors). Thus, the research issues outlined in this proposal are relevant to major automotive companies like Volvo, General Motors, DaimlerChrysler and Toyota. Companies like Xilinx are now producing reconfigurable architectures that are directly targeted towards automotive electronic systems [12]. Such devices have been utilized by companies specializing in automotive electronics products like Siemens VDO, Motion Engineering and Continental AG [13]. Our discussion on cross-layer optimization techniques for reconfigurable architecture thus significant to the industry in this context as well.


Executive Summary of Progress

(updated April 2013)

9 papers have been published within this project.


4 Masters students have graduated after completing thesis in this project.


At present, we have 2 PhD students, extending the project in new directions.


2 Masters students and 2 Bachelors students are pursuing their projects.


We have built a strong network with industry, including new collaborations with Vivante and AMD.


Collaborated with the CENIIT project 10.4 led by Fredrik Heintz out of which Viet Ha Nguyen’s Masters thesis was produced.

 

Publications related to this CENIIT project

(updated April 2013)

Probabilistic Timing Analysis for the Dynamic Segment of FlexRay
B. Tanasa, U. D. Bordoloi, P. Eles, Z. Peng
Euromicro Conference on Real-Time Systems (ECRTS), Paris, France, July 2013


On the Timing Analysis of the Dynamic Segment of FlexRay
U. D. Bordoloi, B. Tanasa, P. Eles, Z. Peng
International Symposium on Industrial Embedded Systems (SIES 2012), Karlsruhe, Germany,  

June 20-22, 2012


Soheil Samii, Unmesh D. Bordoloi, Petru Eles, Zebo Peng, Anton Cervin Control-Quality Optimization for Distributed Embedded Systems with Adaptive Fault Tolerance. 24th Euromicro Conference on Real-Time Systems (ECRTS 2012), Pisa, Italy, July 10-13, 2012.


Unmesh D. Bordoloi, Bogdan Tanasa, Petru Eles, Zebo Peng. On the Timing Analysis of the Dynamic Segment of FlexRay. International Symposium on Industrial Embedded Systems (SIES 2012), Karlsruhe, Germany, June 20-22, 2012.


Bogdan Tanasa, Unmesh D. Bordoloi, Stefanie Kosuch, Petru Eles, Zebo Peng Schedulability Analysis for the Dynamic Segment of FlexRay: A Generalization to Slot Multiplexing. Real-Time and Embedded Technology and Applications Symposium (RTAS 12), Beijing, China, April 16-19, 2012


Bharath Suri, Unmesh D. Bordoloi, Petru Eles. A Scalable GPU-Based Approach to Accelerate the Multiple-Choice Knapsack Problem. Design Automation and Test in Europe (DATE) (’Interactive Presentation’ paper), Dresden, Germany, 12-16 March, 2012


Unmesh D. Bordoloi, Bharath Suri, Swaroop Nunna, Samarjit Chakraborty, Petru Eles and Zebo Peng
Customizing Instruction Set Extensible Reconfigurable Processors using GPUs 25th International Conference on VLSI Design, Hyderabad, India, 07-11, January, 2012


Venkata Podduturi, A SystemC simulator for the dynamic segment of the FlexRay protocol. Master’s thesis, Linköping Universitet 2012. LIU-IDA/LITH-EX-A12/059SE


Mohammad Alhowaidi. Real-Time Systems with Radiation-Hardened Processors: A GPU-based Framework to Explore Tradeoffs. Master’s thesis, Linköping Universitet 2012. LIU-IDA/LITH-EX-A12/017SE


Viet Ha Nguyen. Design Space Exploration of the Quality of Service for Stream Reasoning Applications
Master’s thesis, Linköping Universitet 2012. LIU-IDA/LITH-EX-A–12/027–SE


Boggdan Tanasa, Unmesh D. Bordoloi, Petru Eles, Zebo Peng. Reliability-Aware Frame Packing for the Static Segment of FlexRay. The Intl. Conf. on Embedded Software (EMSOFT), Taipei, Taiwan, October 9-14, 2011.


Reinhard Schneider, Dip Goswami, Samarjit Chakraborty, Unmesh D. Bordoloi, Petru Eles, Zebo Peng.
On the Quantification of Sustainability and Extensibility of FlexRay Schedules 48th Design Automation Conference (DAC), San Diego, CA, USA, June 5-10, 2011.


Bharath Suri. Accelerating Knapsack Problems on GPU Master’s thesis, Linköping Universitet 2011 LIU-IDA/LITH-EX-A11/029SE


References

 

[1] S. Borkar. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 25(6), 2005.

 

[2] N. P. Carter, H. Naeimi, and D. S. Gardner. Design techniques for cross-layer resilience. In Design, Automation and Test in Europe, 2010.

 

[3] F. Corno, M. Sonza Reorda, S. Tosato, and F. Esposito. Evaluating the effects of transient faults on vehicle dynamic performance in automotive systems. In International Test Conference, 2004.

 

[4] The FlexRay Communications System Specifications, Ver. 2.1. www.flexray.com.

 

[5] B. Gaujal and N. Navet. Maximizing the robustness of TDMA networks with applications to TTP/C. Real-Time Systems, 31(1-3):5–31, 2005.

 

[6] D. Goodwin and D. Petkov. Automatic generation of application specific processors. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems, 2003.

 

[7] V. Izosimov, P. Pop, P. Eles, and Z. Peng. Design optimization of time-and costconstrained fault-tolerant distributed embedded systems. In Design, Automation and Test in Europe, 2005.

 

[8] S. S. Mukherjee, J. Emer, and S. K. Reinhardt. The soft error problem: An architectural perspective. In International Symposium on High-Performance Computer Architecture, 2005.

 

[9] S. R. Nassif, N. Mehta, and Y. Cao. A resilience roadmap. In Design, Automation and Test in Europe, 2010.

 

[10] Z. Quming and K Mohanram. Gate sizing to radiation harden combinational logic. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(1):155 – 166, 2006.

 

[11] B. Tanasa, U. D. Bordoloi, P. Eles, and Z. Peng. Scheduling for fault-tolerant communication on the static segment of FlexRay. In Real-Time Systems Symposium, 2010.

 

[12] Xilinx Automotive. http://www.xilinx.com/esp/automotive.

 

[13] Xilinx Press: Customer Quotes. http://press.xilinx.com.

 

[14] E. Zanoni and P. Pavan. Improving the reliability and safety of automotive electronics. IEEE Micro, 13(1), 1993.


 


                    Home        Publications        Projects        Service        Teaching