Multiprocessor System Interconnects

Multiprocessor System Interconnects • Allows fast communication among processors and shared memory, I/O, and peripheral devices • IPMN: processors to shared memory • PION: processors to I/O and peripheral devices • IPCN: processors to processors EENG-630

Network Characteristics • Timing: synchronous/asynchronous • Global clock / handshaking • Switching: circuit or packet • Path granted / compete for path • Control: centralized/distributed • Global controller / local devices • Topology • Bus system / crossbar / multistage EENG-630

EENG-630

Hierarchical Bus System • Consists of a hierarchy of buses connecting various components • Each bus formed with signal, control, and power lines • Different buses perform different interconnection functions EENG-630

EENG-630

Local Bus • Implemented on printed-circuit boards • Provides common communication path among components on the board • Memory bus on memory board • Data bus on I/O bus • Consists of signal and utility lines EENG-630

Backplane and I/O Bus • Printed circuit on which many connectors are used to plug in functional boards • System bus provides a common communication path among all plug-in boards • Made of coaxial cables with taps connecting disks, printer, and tape units to a processor thru an I/O controller EENG-630

Hierarchical Cache/Bus Architecture • Leaf nodes are processors and their private caches • Divided into clusters – cluster bus • Intercluster bus connects clusters • Second level caches used b/t cluster and intercluster buses • Each cluster operates as a single-bus system • Most memory requests satisfied at lower level caches EENG-630

Second level caches used to extend consistency from each cluster to upper level • Upper level caches form another level of shared memory • Bridges b/t clusters allow transactions initiated on a local bus to be completed on a remote bus EENG-630

EENG-630

Single stage: Recirculating n/w Cheaper, but more passes needed Crossbar switch and multiport memory org. Multistage: More than one stage of switch boxes Should connect any input to any output May have same pattern at each stage Omega, Flip, and Baseline Network Stages EENG-630

Simultaneous connections of some multiple I/O pairs may result in conflicts Omega, Baseline, Banyan Most multistage n/w May need multiple passes Can perform all possible connections by rearranging its connections Connection path can always be established Benes and Clos May require more stages Blocking vs. Nonblocking EENG-630

Crossbar Networks • Every input port connected to a free output port w/o blocking • Single-stage n/w with unary switches • Requires nm crosspoint switches • If n=m, then can implement n! permutations without blocking EENG-630

Crosspoint Switch Design • Only one switch/column can be connected at a time – need extra h/w to resolve • Each crosspoint has complexity of a bus • Requires extensive h/w – limit to n  16 • Can connect multiple switches/row EENG-630

EENG-630

Crossbar Limitations • At most, can deliver n words to at most n processors in each memory cycle • Memory modules can be n-way interleaved to allow overlapped access • Offers highest b/w of n data transfers/cycle • Cost effective for small multiprocessors with a few processors accessing few memory modules EENG-630

Multiport Memory • Moves all crosspoint arbitration and switching functions to memory controller • Memory module is more expensive • One of n processor requests honored at a time • B/t low-cost, low-performance bus system, and high-cost, high-performance xbar • Contention bus is time-shared and multiport memory must resolve conflicts among processors EENG-630

EENG-630

Multiport Limitations • Expensive when m and n become large • Typically, n = 4 processors, m = 16 modules • Not scalable • Needs large number of interconnection cables and connectors when configuration becomes large EENG-630

Routing in Omega Networks • If n inputs, then log n stages • Route by destination code • ith high-order bit = 0, upper o/p at stage i • Can have conflicts – blocking n/w • May need several passes • Can implement nn/2 permutations in one pass, out of n! permutations EENG-630

EENG-630

Routing in Butterfly Networks • Constructed w/crossbar switches • If mxm crossbar switches, then # stages = logmn # of switches per stage = n/m • No broadcast connections allowed • Can modularly construct larger Butterfly networks by using more stages EENG-630

EENG-630

Hot-Spot Problem • Occurs when n/w traffic is nonuniform • A memory module is accessed excessively by many processors at the same time • Degrades network performance • Can use a combining mechanism to combine multiple requests • Atomic read-modify-write primitive Fetch&Add(x,e) performs parallel memory updates using the combining network EENG-630

Fetch&Add(x,e) • Implements an N-way synchronization with a complexity independent of N • x is an integer variable in shared memory • e is an integer increment Fetch&Add(x,e) [single processor] { temp xi x  temp + ei return temp} EENG-630

If N processors, memory updated only once following a serialization principle • The sum of the N increments is produced in any arbitrary serialization of the requests • The values returned to the N requests are all unique • Net result is similar to a sequential execution of N Fetch&Adds EENG-630

EENG-630

Message-Passing Mechanisms • Store-and-forward routing • Wormhole routing • Virtual channels • Deadlock situations • Deterministic and adaptive routing algorithms EENG-630

Message Formats • Message: logical unit for internode communication • Packet: basic unit containing destination address for routing • Packets have sequencing # for reassembly • Flits: flow control digits of packets • Store-and-forward: packets • Wormhole routing: flits EENG-630

Packets and Flits • Header flits contain routing information and sequence number • Flit length affected by network size • Packet length determined by routing scheme and network implementation • Lengths also dependent on channel b/w, router design, network traffic, etc. EENG-630

Message Format EENG-630

Store-and-Forward Routing • Packets are the basic unit • Each node has a packet buffer • When a packet reaches an intermediate node, it is first stored in the buffer, sent when output channel and next buffer ready • Latency directly proportional to the distance between source and destination EENG-630

Wormhole Routing • Flits are the basic unit • Transmission through sequence of routers • All flits of same packet are pipelined • All data flits follow header flit • Packets can be interleaved, not flits • Latency is almost independent of distance EENG-630

EENG-630

Asynchronous Pipelining • Pipelining of flits is asynchronous • A 1-bit ready/request line used between adjacent routers • When D is ready to receive a flit, R/A = 0 • When S ready, R/A = 1, and transmits flit i • While flit being received, R/A stays high • Repeat cycle for remaining flits EENG-630

Handshaking Protocol EENG-630

Latency Analysis • L=packet length W=channel b/w (bits/s) • D=distance F=flit length • TSF=L/W (D + 1) • TWH=L/W + F/W x D • Store-and-forward: controlled by s/w • Wormhole: controlled by h/w EENG-630

EENG-630

Virtual Channels • A logical link b/t two nodes, formed by a flit buffer in source, a physical channel b/t them, and a flit buffer in receiver • Physical channel is time-shared by virtual channels • Sharing of physical channel by set of virtual channels is conducted bytime-multiplexing on a flit-by-flit basis EENG-630

EENG-630

Deadlock Avoidance • Unidirectional/bidirectional channels • Combining two unidirectionals into one bidirectional will increase utilization rate and double channel b/w • Arbitration more complex for bidirectional • High-speed mulitplexing is required for implementing large # of virtual channels EENG-630

EENG-630

Packet Collision Resolution • To move a flit b/t adjacent nodes must have: • Source buffer holding flit • Channel being allocated • Receiver buffer accepting flit • Arbitration decisions • Which packet will be allocated the channel • What to do with rejected packet EENG-630

Buffering with Virtual Cut-Through Routing • Rejected packet temporarily stored in buffer • Requires large buffer to hold entire packet • Does not waste allocated resources • Best case: wormhole routing • Worst case: store-and-forward EENG-630

Blocking and Detour Policies • Blocking: block rejected packet, do not abandon • Economical, idle resources • Discard: drops blocked packed • Waste of resources • Detour: misroute to a detour channel • Flexible, but wastes channel resources, may cause cycle of livelock EENG-630

EENG-630

Dimension-Order Routing • Deterministic: patch completely determined • Adaptive: depends on n/w conditions • Dimension-Order: Require selection of successive channels to follow a specific order based on dimensions • X-Y routing, E-cube routing EENG-630

EENG-630

Multiprocessor System Interconnects

Multiprocessor System Interconnects

Presentation Transcript

Multiprocessor Systems

Multiprocessor System Interconnects Parallel processing demands the use of efficient system

Basic Interconnects

Chapter 6 Multiprocessor System

Optical Interconnects

Buses and INterconnects

Multiprocessor Scheduling

Self-Tuned Distributed Multiprocessor System

Multiprocessor Scheduling

MULTIPROCESSOR OPERATING SYSTEM

Interconnects

System Synthesis for Multiprocessor Embedded Applications

VLSI Interconnects

Introduction to Multiprocessor System-on-Chip

Bio-templated Interconnects

Problems in Task Scheduling in Multiprocessor System

Multiprocessor Architectures

Introduction to Multiprocessor System-on-Chip

Multiprocessor Scheduling

Symmetric multiprocessor

COMMUNICATION BETWEEN ADSP-TS201 MULTIPROCESSOR SYSTEM

Interconnects Market