Energy-Efficient Fault Tolerance in Chip Multiprocessors Using Critical Value Forwarding
The 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'10), Fairmont Chicago, Millennium Park, Chicago, Illinois, USA, June 28-July 1, 2010, pp. 121-130.
ABSTRACT
Relentless CMOS scaling coupled with lower design tolerances is making ICs increasingly susceptible to wear-out related permanent faults and transient faults, necessitating on-chip fault tolerance in future chip microprocessors (CMPs). In this paper we introduce a new energy-efficient fault-tolerant CMP architecture known as Redundant Execution using Critical Value Forwarding (RECVF). RECVF is based on two observations: (i) forwarding critical instruction results from the leading to the trailing core enables the latter to execute faster, and (ii) this speedup can be exploited to reduce energy consumption by operating the trailing core at a lower voltage-frequency level. Our evaluation shows that RECVF consumes 37% less energy than conventional dual modular redundant (DMR) execution of a program. It consumes only 1.26 times the energy of a nonfault- tolerant baseline and has a performance overhead of just 1.2%.
Copyright note for papers published by the IEEE Computer Society:
Copyright IEEE. Personal use of this material is permitted. However,
permission to reprint/republish this material for advertising or
promotional purposes or for creating new collective works for resale
or redistribution to servers or lists, or to reuse any copyrighted
component of this work in other works, must be obtained from the IEEE.
[SSKL10] Pramod Subramanyan, Virendra Singh, Kewal K. Saluja, Erik Larsson, "Energy-Efficient Fault Tolerance in Chip Multiprocessors Using Critical Value Forwarding", The 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'10), Fairmont Chicago, Millennium Park, Chicago, Illinois, USA, June 28-July 1, 2010, pp. 121-130. |