270 likes | 390 Vues
Dataflow machines represent a paradigm shift from conventional control-driven programming models. In dataflow architectures, instructions execute based on the availability of data rather than a predetermined sequence. This allows for greater parallelism by enabling multiple instructions to fire simultaneously as their inputs become available. Various experimental machines, such as Manchester Gurd & Watson and MIT's Monsoon, exemplify these concepts. This model has implications for the design of efficient parallel algorithms, leveraging message-passing and specialized networks to enhance performance in computational tasks.
E N D
Computer Architecture Dataflow Machines
Data Flow • Conventional programming models are control driven • Instruction sequence is precisely specified • Sequence specifies control • which instruction the CPU will execute next • Execution rule: • Execute an instruction when its predecessor has completed • s1: r = a*b;s2: s = c*d;s3: y = r + s; s2 executes when s1 is complete s3 executes when s2 is complete
c d x Data Flow • Consider the calculation • y = a*b + c*d • Represent it bya graph • Nodes representcomputations • Data flows alongarcs • Execution rule: • Execute an instruction when its data is available • Data driven rule a b x + y
c d x Data Flow • Dataflow firing rule • An instruction fires (executes)when its data is available • Exposes all possible parallelism • Either multiplication canfire as soon as data arrives • Addition must wait • Data dependence analysis! • Instruction issue units: • Fire (issue) each instructionwhen its operands (registers) have been written a b x + y
Data Flow - Realisations • Several Experimental Machines built • Manchester Gurd & Watson • Tagged Token Arvind, MIT • Sigma ETL, Tsukuba • EMC-4 ETL, Tsukuba • Monsoon Arvind, MIT • EMX ETL, Tsukuba • RAPID Osaka/Sharp/Mitsubishi(Asynchronous!) • Naiad Tasmania • and some others
Data Flow - Realisations • Manchester
Data Flow - Program • Program word • Matching Store Entry • When both Presence Flags are Y,this packet is despatched to a PE (any PE!) Destination Left or Right Operation +, -, *, / etc Destination Address Left, Right Operands Presence Flags
Data Flow - Matching Store • Special purpose memory • Limited processing capability • Detects full slots • Despatches operation packets to any idle PE Destination Left or Right Operation +, -, *, / etc Destination Address Left, Right Operands Presence Flags
Data Flow - Processing Elements • Receive operation packets • Generate result • Form result packet • Despatch to matching store
Data Flow - EM4 • Architects • Yamaguchi,Sakai, Kodama,Sato et al • ElectroTechnicalLaboratory,Tsukuba,Japan • PE (EM-Y) • CMOS Gate Array • 80k gates / 1.0m • f = 20MHz • ~1992
Data Flow - Monsoon • Architects • Papadopoulos, Culleret al • MIT, Cambridge • PE • f = 10MHz • ~1990 • I-StructureProcessor
Data Flow - I-Structures • Memory with a presence bit • Tag each memory location with a bitindicating its validity • Valid bit set -> normal read (no wait) • Data not yet written (valid bit not set) • Wait • Read requests queued • Data driven execution • Operations proceed when data is available valid data valid data valid data
Data Flow - Monsoon Pipeline • 8 stage pipeline • “Presence bits”checks operandavailability • Frame (coarse grain)basis
Data Flow - Summary • Fine-Grain Dataflow • Suffered from comms network overload! • Coarse-Grain Dataflow • Monsoon ... • Overtaken by commercial technology!! • A sad “fact-of-life” • It’s almost impossible to generate the fundsfor non-”mainstream” computer architecture research • $n x 108 required L • Non-mainstream = interesting!
Data Flow - Summary • As a software model … • Functional languages • Dataflow in a different guise! • Theoretically • important • Practically? • Inefficient ( = slow!!) • ….. Ask your CS colleagues! • Cilk - based on C • Used on CIIPS Myrmidons • Uses a dataflow model • Threads become ready for execution when their data is generated • Message passing efficiency • Without explicit data transfer & synchronisation!
Networks • Network Topology (or shape) • Vital to efficient parallel algorithms • Communication is the limiting factor! • Ideal • Cross-bar • Any-to-any • Non-blocking • Except two sources to same receiver • Realisable • But only for limited order (number of ports)
Networks • Cross-bars • Achilles • 8 x 8 • Full duplex • Simultaneous Input and Outputat each port • 32 bit data-path • Target : 1Gbyte / second total throughput • but we needed the 3-D arrangement to achieve • bandwidth • high order
Networks • Cross-bars • Achilles • Hardwarealmost trivial! • Single FPGAon each level • Programmable • VHDL Models • Several topologies • Just by changing thesoftware!
Networks - More than 8 PEs • Simple • Use 2 8x8 routers! but…. This link gets a lot of traffic!
Networks - Fat tree • Problem: • High-traffic links between PEs can become a bottleneck • Solution: Fat-tree • Links higher up the tree are “fatter” • Sustainable bandwidth between all PEs is the same
Networks - Performance Metrics • Metrics for comparing network topologies • Diameter • Maximum distance between any pair of nodes • Determines latency • Bisection Bandwidth • Aggregate bandwidth over any “cut”which divides the network in half • Determines throughput • Crossbar • Diameter: 1 • Every PE is directly connected to routerso a single “hop” suffices • Bisection Bandwidth: b bytes/sec • b is the bandwidth of a single link
Networks - Performance Metrics • Metrics for comparing network topologies • To connect n PEs with mxm crossbars • Single link bandwidth b bytes/s • Simple: n = 14 (2 switches) • Diameter 3 • Bisection Bandwidth b 1 2 3
Networks - Performance Metrics • Fat-tree • Diameter: 2 logmn • Height is logmn • Worst case distance - up and down • Bisection Bandwidth: b n/2 bytes/sec • Links are fatter higher up the tree logmn
Networks - Performance Metrics • Mesh • Diameter: 2Ön-2 • Bisection Bandwidth: b Ön bytes/sec • Order: 4
Networks - Performance Metrics • Hypercube • Hypercube of order m • Link 2 order m-1 hypercubes with 2m-1 links • Number of PEs: n = 2m • Order: log2n = m Order 2 Hypercube Order 2 Hypercube Order 3 Hypercube
Networks - Hypercubes • Embedding property • In an n PE hypercube,we have hypercubes of size n/2, n/4, … • Number PEs with binary numbers • 000, 001, 010, 011, 100, … • Joining two hypercubes • add one binary digitto the numbering • Each PE is connectedto every PE whoseindex differs in only one bit
Networks - Hypercubes • Embedding property • Partitioning tasks • Allocate to sub-cubes • Sub-tasks allocated tosub-cubes of that cube,etc