Built-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design A Self-Test, Self-Diagnosis, and Self-Repair-Based Approach

With the end of Dennard scaling and Moore’s law, IC chips, especially large-scale ones, now face more reliability challenges, and reliability has become one of the mainstay merits of VLSI designs. In this context, this book presents a built-in on-chip fault-tolerant computing paradigm that seeks to...

Full description

Bibliographic Details
Main Authors: Li, Xiaowei, Yan, Guihai (Author), Liu, Cheng (Author)
Format: eBook
Language:English
Published: Singapore Springer Nature Singapore 2023, 2023
Edition:1st ed. 2023
Subjects:
Online Access:
Collection: Springer eBooks 2005- - Collection details see MPG.ReNa
LEADER 03262nmm a2200337 u 4500
001 EB002152728
003 EBX01000000000000001290854
005 00000000000000.0
007 cr|||||||||||||||||||||
008 230403 ||| eng
020 |a 9789811985515 
100 1 |a Li, Xiaowei 
245 0 0 |a Built-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design  |h Elektronische Ressource  |b A Self-Test, Self-Diagnosis, and Self-Repair-Based Approach  |c by Xiaowei Li, Guihai Yan, Cheng Liu 
250 |a 1st ed. 2023 
260 |a Singapore  |b Springer Nature Singapore  |c 2023, 2023 
300 |a XVIII, 304 p. 1 illus  |b online resource 
505 0 |a Chapter 1: Introduction -- Chapter 2: Fault-tolerant general circuits with 3S -- Chapter 3: Fault-tolerant general purposed processors with 3S -- Chapter 4: Fault-tolerant network-on-chip with 3S -- Chapter 5: Fault-tolerant deep learning processors with 3S -- Chapter 6: Conclusion 
653 |a Hardware Performance and Reliability 
653 |a Computers 
653 |a Computer Hardware 
653 |a Processor Architectures 
653 |a Microprocessors 
653 |a Computer architecture 
700 1 |a Yan, Guihai  |e [author] 
700 1 |a Liu, Cheng  |e [author] 
041 0 7 |a eng  |2 ISO 639-2 
989 |b Springer  |a Springer eBooks 2005- 
028 5 0 |a 10.1007/978-981-19-8551-5 
856 4 0 |u https://doi.org/10.1007/978-981-19-8551-5?nosfx=y  |x Verlag  |3 Volltext 
082 0 |a 004.24 
520 |a With the end of Dennard scaling and Moore’s law, IC chips, especially large-scale ones, now face more reliability challenges, and reliability has become one of the mainstay merits of VLSI designs. In this context, this book presents a built-in on-chip fault-tolerant computing paradigm that seeks to combine fault detection, fault diagnosis, and error recovery in large-scale VLSI design in a unified manner so as to minimize resource overhead and performance penalties. Following this computing paradigm, we propose a holistic solution based on three key components: self-test, self-diagnosis and self-repair, or “3S” for short. We then explore the use of 3S for general IC designs, general-purpose processors, network-on-chip (NoC) and deep learning accelerators, and present prototypes to demonstrate how 3S responds to in-field silicon degradation and recovery under various runtime faults caused by aging, process variations, or radical particles. Moreover, we demonstrate that 3S not onlyoffers a powerful backbone for various on-chip fault-tolerant designs and implementations, but also has farther-reaching implications such as maintaining graceful performance degradation, mitigating the impact of verification blind spots, and improving chip yield. This book is the outcome of extensive fault-tolerant computing research pursued at the State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences over the past decade. The proposed built-in on-chip fault-tolerant computing paradigm has been verified in a broad range of scenarios, from small processors in satellite computers to large processors in HPCs. Hopefully, it will provide an alternative yet effective solution to the growing reliability challenges for large-scale VLSI designs.