Networks-on-chip from implementations to programming paradigms

This book provides a thorough and bottom-up exploration of the whole NoC design space in a coherent and uniform fashion, from low-level router, buffer and topology implementations, to routing and flow control schemes, to co-optimizations of NoC and high-level programming paradigms. Its coherent and...

Full description

Bibliographic Details
Main Authors: Ma, Sheng, Huang, Libo (Author), Lai, Mingche (Author), Shi, Wei (Author)
Format: eBook
Language:English
Published: Waltham, MA Morgan Kaufmann 2015
Edition:First edition
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • 5.3.1 Insufficient information
  • 5.3.2 Intraregion interference
  • 5.3.3 Inter-region interference
  • 5.4 Destination-based adaptive routing
  • 5.4.1 Destination-based selection strategy
  • 5.4.1.1 Congestion information propagation network
  • 5.4.1.2 DBSS router microarchitecture
  • 5.4.2 Routing function design
  • 5.4.2.1 Offered path diversity
  • 5.4.2.2 VC reallocation scheme
  • 5.5 Evaluation
  • 5.5.1 Evaluation of routing functions
  • 5.5.2 Single-region performance
  • 5.5.2.1 Synthetic traffic results
  • 5.5.2.2 Application results
  • 5.5.3 Multiple-region performance
  • 5.5.3.1 Results for a small regular region
  • 5.5.3.2 Irregular-region results
  • 5.5.3.3 Summary
  • 5.5.4 CMesh evaluation
  • 5.5.4.1 Configuration
  • 5.5.4.2 Performance
  • 5.5.5 Hardware overhead
  • 5.5.5.1 Wiring overhead
  • 5.5.5.2 Router overhead
  • 5.5.5.3 Power consumption
  • 5.6 Analysis and discussion
  • 5.6.1 In-depth analysis of interference
  • 5.6.2 Design space exploration
  • 5.6.2.1 Number of propagation wires
  • 5.6.2.2 DBSS scalability
  • 5.6.2.3 Congestion propagation delay
  • 5.7 Chapter summary
  • References
  • Chapter 6: Flow control for fully adaptive routing
  • 6.1 Introduction
  • 6.2 Background
  • 6.2.1 Deadlock avoidance theories
  • 6.2.2 Fully adaptive routing algorithms
  • 6.3 Motivation
  • 6.3.1 VC reallocation
  • 6.3.2 Routing flexibility
  • 6.4 Flow control and routing designs
  • 6.4.1 Whole packet forwarding
  • 6.4.2 Aggressive VC reallocation for EVCs
  • 6.4.3 Maintain routing flexibility
  • 6.4.4 Router microarchitecture
  • 6.5 Evaluation on synthetic traffic
  • 6.5.1 Performance of synthetic workloads
  • 6.5.2 Buffer utilization of routing algorithms
  • 6.5.3 Sensitivity to network design
  • 6.5.3.1 SFP ratio
  • 6.5.3.2 VC depth
  • 6.5.3.3 VC count
  • 6.5.3.4 Network size
  • 6.6 Evaluation of PARSEC workloads
  • 3.2.2 Congestion avoidance scheme
  • 3.3 Multiple-port shared buffer with congestion awareness
  • 3.3.1 DVC scheme among multiple ports
  • 3.3.2 Congestion avoidance scheme
  • 3.4 DVC router microarchitecture
  • 3.4.1 VC control module
  • 3.4.2 Metric aggregation and congestion avoidance
  • 3.4.3 VC allocation module
  • 3.5 HiBB router microarchitecture
  • 3.5.1 VC control module
  • 3.5.2 VC allocation and output port allocation
  • 3.5.3 VC regulation
  • 3.6 Evaluation
  • 3.6.1 DVC router evaluation
  • 3.6.2 HiBB router evaluation
  • 3.7 Chapter summary
  • References
  • Chapter 4: Virtual bus structure-based network-on-chip topologies
  • 4.1 Introduction
  • 4.2 Background
  • 4.3 Motivation
  • 4.3.1 Baseline on-chip communication networks
  • 4.3.1.1 Transaction-based bus
  • 4.3.1.2 Packet-based NoC
  • 4.3.2 Analysis of NoC problems
  • 4.3.2.1 Multihop problem
  • 4.3.2.2 Multicast problem
  • 4.3.3 Advantages of a transaction-based bus
  • 4.4 The VBON
  • 4.4.1 Interconnect structures
  • 4.4.1.1 Wire delay consideration
  • 4.4.2 The VB mechanism
  • 4.4.2.1 The VB construction
  • 4.4.2.2 VB arbitration
  • 4.4.2.3 Packet format
  • 4.4.2.4 VB operation
  • 4.4.2.5 A simple example for VB communication
  • 4.4.3 Starvation and deadlock avoidance
  • 4.4.4 The VBON router microarchitecture
  • 4.5 Evaluation
  • 4.5.1 Simulation infrastructures
  • 4.5.1.1 Router choices for comparison
  • 4.5.1.2 Network configuration
  • 4.5.1.3 Traffic generation
  • 4.5.2 Synthetic traffic evaluations
  • 4.5.2.1 Single-level 4 4 VBON
  • 4.5.2.2 Hierarchical 8 8 VBON
  • 4.5.3 Real application evaluations
  • 4.5.4 Power consumption analysis
  • 4.5.5 Overhead analysis
  • 4.6 Chapter summary
  • References
  • Part III: Routing and flow Control
  • Chapter 5: Routing algorithms for workload consolidation
  • 5.1 Introduction
  • 5.2 Background
  • 5.3 Motivation
  • 6.6.1 Methodology and configuration
  • 6.6.2 Performance
  • 6.7 Detailed analysis of flow control
  • 6.7.1 The detailed buffer utilization
  • 6.7.1.1 Allowable EVCs
  • 6.7.1.2 Performance analysis
  • 6.7.2 The effect of flow control on fairness
  • 6.8 Further discussion
  • 6.8.1 Packet length
  • 6.8.2 Dynamically allocated multiqueue and hybrid flow controls
  • 6.9 Chapter summary
  • Appendix: Logical Equivalence of Alg and Alg + WPF
  • References
  • Chapter 7: Deadlock-free flow control for torus networks-on-chip
  • 7.1 Introduction
  • 7.2 Limitations of existing designs
  • 7.2.1 Dateline
  • 7.2.2 Localized bubble scheme
  • 7.2.3 Critical bubble scheme
  • 7.2.4 Inefficiency with variable-size packets
  • 7.3 Flit bubble flow control
  • 7.3.1 Theoretical description
  • 7.3.2 FBFC-localized
  • 7.3.3 FBFC-critical
  • 7.3.4 Starvation
  • 7.4 Router microarchitecture
  • 7.4.1 FBFC routers
  • 7.4.2 VCT routers
  • 7.5 Methodology
  • 7.6 Evaluation on 1D tori (rings)
  • 7.6.1 Performance
  • 7.6.2 Buffer utilization
  • 7.6.3 Latency of short and long packets
  • 7.7 Evaluation on 2D tori
  • 7.7.1 Performance for a 44 torus
  • 7.7.2 Sensitivity to SFP ratios
  • 7.7.3 Sensitivity to buffer size
  • 7.7.4 Scalability for an 88 torus
  • 7.7.5 Effect of starvation
  • 7.7.6 Real application performance
  • 7.7.7 Large-scale systems and message passing
  • 7.8 Overheads: Power and area
  • 7.8.1 Methodology
  • 7.8.2 Power efficiency
  • 7.8.3 Area
  • 7.8.4 Comparison with meshes
  • 7.9 Discussion and related work
  • 7.9.1 Discussion
  • 7.9.2 Related work
  • 7.10 Chapter summary
  • References
  • Part IV: Programming paradigms
  • Chapter 8: Supporting cache-coherent collective communications
  • 8.1 Introduction
  • 8.2 Message combination framework
  • 8.2.1 MCT format
  • 8.2.2 Message combination example
  • 8.2.3 Insufficient MCT entries
  • 8.3 BAM routing
  • Includes bibliographical references and index
  • Front Cover
  • Networks-on-Chip: From Implementations to Programming Paradigms
  • Copyright
  • Contents in Brief
  • Contents
  • Preface
  • About the Editor-in-Chief and Authors
  • Editor-in-Chief
  • Authors
  • Part I: Prologue
  • Chapter 1: Introduction
  • 1.1 The dawn of the many-core era
  • 1.2 Communication-centric cross-layer optimizations
  • 1.3 A baseline design space exploration of NoCs
  • 1.3.1 Topology
  • 1.3.2 Routing algorithm
  • 1.3.3 Flow control
  • 1.3.4 Router microarchitecture
  • 1.3.5 Performance metric
  • 1.4 Review of NoC research
  • 1.4.1 Research on topologies
  • 1.4.2 Research on unicast routing
  • 1.4.3 Research on supporting collective communications
  • 1.4.4 Research on flow control
  • 1.4.5 Research on router microarchitecture
  • 1.5 Trends of real processors
  • 1.5.1 The MIT Raw processor
  • 1.5.2 The Tilera TILE64 processor
  • 1.5.3 The Sony/Toshiba/IBM Cell processor
  • 1.5.4 The U.T. Austin TRIPS processor
  • 1.5.5 The Intel Teraflops processor
  • 1.5.6 The Intel SCC processor
  • 1.5.7 The Intel Larrabee processor
  • 1.5.8 The Intel Knights Corner processor
  • 1.5.9 Summary of real processors
  • 1.6 Overview of the book
  • References
  • Part II: Logic implementations
  • Chapter 2: A single-cycle router with wing channels
  • 2.1 Introduction
  • 2.2 The router architecture
  • 2.2.1 The overall architecture
  • 2.2.2 Wing channels
  • 2.3 Microarchitecture designs
  • 2.3.1 Channel dispensers
  • 2.3.2 Fast arbiter components
  • 2.3.3 SIG managers and SIG controllers
  • 2.4 Experimental results
  • 2.4.1 Simulation infrastructures
  • 2.4.2 Pipeline delay analysis
  • 2.4.3 Latency and throughput
  • 2.4.4 Area and power consumption
  • 2.5 Chapter summary
  • References
  • Chapter 3: Dynamic virtual channel routers with congestion awareness
  • 3.1 Introduction
  • 3.2 DVC with congestion awareness
  • 3.2.1 DVC scheme
  • 8.4 Router pipeline and microarchitecture
  • 8.5 Evaluation
  • 8.5.1 Performance
  • 8.5.1.1 Overall network performance
  • 8.5.1.2 Multicast transaction performance
  • 8.5.1.3 Real application performance
  • 8.5.2 Comparing multicast VN configurations
  • 8.5.2.1 Unicast performance
  • 8.5.2.2 Multicast performance
  • 8.5.3 MCT size
  • 8.5.4 Sensitivity to network design
  • 8.5.4.1 VC count
  • 8.5.4.2 Multicast ratio
  • 8.5.4.3 Destinations per multicast
  • 8.5.4.4 Network size
  • 8.6 Power analysis
  • 8.7 Related work
  • 8.7.1 Message combination
  • 8.7.2 NoC multicast routing
  • 8.8 Chapter summary
  • References
  • Chapter 9: Network-on-chip customizations for message passing interface primitives
  • 9.1 Introduction
  • 9.2 Background
  • 9.3 Motivation
  • 9.3.1 MPI adaption in NoC designs
  • 9.3.2 Optimizations of MPI functions
  • 9.4 Communication customization architectures
  • 9.4.1 Architecture overview
  • 9.4.2 The customized NoC design: VBON
  • 9.4.3 The MPI primitive implementation: MU
  • 9.4.3.1 The architecture of the MU
  • 9.4.3.2 MPI processing unit
  • 9.4.3.3 The collective operation implementation
  • 9.4.3.4 Communication protocols
  • 9.5 Evaluation
  • 9.5.1 Methodology
  • 9.5.2 Experimental results
  • 9.5.2.1 The effect of point-to-point communication: Bandwidth
  • 9.5.2.2 The effect of collective communication: Broadcast operations
  • 9.5.2.3 The effect of collective communication: Barrier operations
  • 9.5.2.4 The effect of collective communication: Reduce operation
  • 9.5.2.5 The effect of application communication: Performance
  • 9.5.2.6 The effect of application communication: Power and scalability
  • 9.5.2.7 Implementation overheads
  • 9.6 Chapter summary
  • References
  • Chapter 10: Message passing interface communication protocol optimizations
  • 10.1 Introduction
  • 10.2 Background
  • 10.2.1 Communication protocols in MPI.