590 likes | 854 Vues
This document delves into various multiprocessor system interconnects designed to facilitate rapid communication among processors, shared memory, I/O, and peripheral devices. It covers key interconnect protocols such as IPMN, PION, and IPCN, explores network characteristics including timing and switching, and discusses the design of hierarchical bus systems and crossbar networks. Additionally, it addresses the limitations of current routing techniques, such as multiport memory and crossbar designs, and highlights approaches to mitigate issues like the hot-spot problem in network traffic.
E N D
Multiprocessor System Interconnects • Allows fast communication among processors and shared memory, I/O, and peripheral devices • IPMN: processors to shared memory • PION: processors to I/O and peripheral devices • IPCN: processors to processors EENG-630
Network Characteristics • Timing: synchronous/asynchronous • Global clock / handshaking • Switching: circuit or packet • Path granted / compete for path • Control: centralized/distributed • Global controller / local devices • Topology • Bus system / crossbar / multistage EENG-630
Hierarchical Bus System • Consists of a hierarchy of buses connecting various components • Each bus formed with signal, control, and power lines • Different buses perform different interconnection functions EENG-630
Local Bus • Implemented on printed-circuit boards • Provides common communication path among components on the board • Memory bus on memory board • Data bus on I/O bus • Consists of signal and utility lines EENG-630
Backplane and I/O Bus • Printed circuit on which many connectors are used to plug in functional boards • System bus provides a common communication path among all plug-in boards • Made of coaxial cables with taps connecting disks, printer, and tape units to a processor thru an I/O controller EENG-630
Hierarchical Cache/Bus Architecture • Leaf nodes are processors and their private caches • Divided into clusters – cluster bus • Intercluster bus connects clusters • Second level caches used b/t cluster and intercluster buses • Each cluster operates as a single-bus system • Most memory requests satisfied at lower level caches EENG-630
Second level caches used to extend consistency from each cluster to upper level • Upper level caches form another level of shared memory • Bridges b/t clusters allow transactions initiated on a local bus to be completed on a remote bus EENG-630
Single stage: Recirculating n/w Cheaper, but more passes needed Crossbar switch and multiport memory org. Multistage: More than one stage of switch boxes Should connect any input to any output May have same pattern at each stage Omega, Flip, and Baseline Network Stages EENG-630
Simultaneous connections of some multiple I/O pairs may result in conflicts Omega, Baseline, Banyan Most multistage n/w May need multiple passes Can perform all possible connections by rearranging its connections Connection path can always be established Benes and Clos May require more stages Blocking vs. Nonblocking EENG-630
Crossbar Networks • Every input port connected to a free output port w/o blocking • Single-stage n/w with unary switches • Requires nm crosspoint switches • If n=m, then can implement n! permutations without blocking EENG-630
Crosspoint Switch Design • Only one switch/column can be connected at a time – need extra h/w to resolve • Each crosspoint has complexity of a bus • Requires extensive h/w – limit to n 16 • Can connect multiple switches/row EENG-630
Crossbar Limitations • At most, can deliver n words to at most n processors in each memory cycle • Memory modules can be n-way interleaved to allow overlapped access • Offers highest b/w of n data transfers/cycle • Cost effective for small multiprocessors with a few processors accessing few memory modules EENG-630
Multiport Memory • Moves all crosspoint arbitration and switching functions to memory controller • Memory module is more expensive • One of n processor requests honored at a time • B/t low-cost, low-performance bus system, and high-cost, high-performance xbar • Contention bus is time-shared and multiport memory must resolve conflicts among processors EENG-630
Multiport Limitations • Expensive when m and n become large • Typically, n = 4 processors, m = 16 modules • Not scalable • Needs large number of interconnection cables and connectors when configuration becomes large EENG-630
Routing in Omega Networks • If n inputs, then log n stages • Route by destination code • ith high-order bit = 0, upper o/p at stage i • Can have conflicts – blocking n/w • May need several passes • Can implement nn/2 permutations in one pass, out of n! permutations EENG-630
Routing in Butterfly Networks • Constructed w/crossbar switches • If mxm crossbar switches, then # stages = logmn # of switches per stage = n/m • No broadcast connections allowed • Can modularly construct larger Butterfly networks by using more stages EENG-630
Hot-Spot Problem • Occurs when n/w traffic is nonuniform • A memory module is accessed excessively by many processors at the same time • Degrades network performance • Can use a combining mechanism to combine multiple requests • Atomic read-modify-write primitive Fetch&Add(x,e) performs parallel memory updates using the combining network EENG-630
Fetch&Add(x,e) • Implements an N-way synchronization with a complexity independent of N • x is an integer variable in shared memory • e is an integer increment Fetch&Add(x,e) [single processor] { temp xi x temp + ei return temp} EENG-630
If N processors, memory updated only once following a serialization principle • The sum of the N increments is produced in any arbitrary serialization of the requests • The values returned to the N requests are all unique • Net result is similar to a sequential execution of N Fetch&Adds EENG-630
Message-Passing Mechanisms • Store-and-forward routing • Wormhole routing • Virtual channels • Deadlock situations • Deterministic and adaptive routing algorithms EENG-630
Message Formats • Message: logical unit for internode communication • Packet: basic unit containing destination address for routing • Packets have sequencing # for reassembly • Flits: flow control digits of packets • Store-and-forward: packets • Wormhole routing: flits EENG-630
Packets and Flits • Header flits contain routing information and sequence number • Flit length affected by network size • Packet length determined by routing scheme and network implementation • Lengths also dependent on channel b/w, router design, network traffic, etc. EENG-630
Message Format EENG-630
Store-and-Forward Routing • Packets are the basic unit • Each node has a packet buffer • When a packet reaches an intermediate node, it is first stored in the buffer, sent when output channel and next buffer ready • Latency directly proportional to the distance between source and destination EENG-630
Wormhole Routing • Flits are the basic unit • Transmission through sequence of routers • All flits of same packet are pipelined • All data flits follow header flit • Packets can be interleaved, not flits • Latency is almost independent of distance EENG-630
Asynchronous Pipelining • Pipelining of flits is asynchronous • A 1-bit ready/request line used between adjacent routers • When D is ready to receive a flit, R/A = 0 • When S ready, R/A = 1, and transmits flit i • While flit being received, R/A stays high • Repeat cycle for remaining flits EENG-630
Handshaking Protocol EENG-630
Latency Analysis • L=packet length W=channel b/w (bits/s) • D=distance F=flit length • TSF=L/W (D + 1) • TWH=L/W + F/W x D • Store-and-forward: controlled by s/w • Wormhole: controlled by h/w EENG-630
Virtual Channels • A logical link b/t two nodes, formed by a flit buffer in source, a physical channel b/t them, and a flit buffer in receiver • Physical channel is time-shared by virtual channels • Sharing of physical channel by set of virtual channels is conducted bytime-multiplexing on a flit-by-flit basis EENG-630
Deadlock Avoidance • Unidirectional/bidirectional channels • Combining two unidirectionals into one bidirectional will increase utilization rate and double channel b/w • Arbitration more complex for bidirectional • High-speed mulitplexing is required for implementing large # of virtual channels EENG-630
Packet Collision Resolution • To move a flit b/t adjacent nodes must have: • Source buffer holding flit • Channel being allocated • Receiver buffer accepting flit • Arbitration decisions • Which packet will be allocated the channel • What to do with rejected packet EENG-630
Buffering with Virtual Cut-Through Routing • Rejected packet temporarily stored in buffer • Requires large buffer to hold entire packet • Does not waste allocated resources • Best case: wormhole routing • Worst case: store-and-forward EENG-630
Blocking and Detour Policies • Blocking: block rejected packet, do not abandon • Economical, idle resources • Discard: drops blocked packed • Waste of resources • Detour: misroute to a detour channel • Flexible, but wastes channel resources, may cause cycle of livelock EENG-630
Dimension-Order Routing • Deterministic: patch completely determined • Adaptive: depends on n/w conditions • Dimension-Order: Require selection of successive channels to follow a specific order based on dimensions • X-Y routing, E-cube routing EENG-630