1 / 19

Communication Overhead Estimation on Multicores

Communication Overhead Estimation on Multicores. S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz. Outline. Motivation Multicore trend Stream programming Profiling communication overhead Related works. 2. 512. PicoChip. AMBRIC. 256.

dalila
Télécharger la présentation

Communication Overhead Estimation on Multicores

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Communication Overhead Estimation on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz

  2. Outline • Motivation • Multicore trend • Stream programming • Profiling communication overhead • Related works 2

  3. 512 PicoChip AMBRIC 256 CISCO CSR1 128 NVIDIA G80 Larrabee 64 Unicore 32 Homogeneous Multicore RAZA XLR Cavium Heterogeneous Multicore 16 RAW Cell Niagara 8 AMD Fusion Opteron 4P BCM 1480 4 Core2Quad Xeon Xbox 360 Power6 Power4 PA8800 PA8800 4004 8008 8080 4004 8008 8080 2 Opteron CoreDuo Core2Duo 8086 286 386 486 Pentium P2 P3 P4 1 Core Athlon Itanium Itanium2 1975 1980 1985 1990 1995 2000 2005 2010 Motivation Stream Programming CUDA Courtesy: Scott’08 X10 Peakstream Fortress C/C++/Java # cores/chip Accelerator Ct C T M Rstream Rapidmind 3

  4. Stream Programming Paradigm Programs expressed as stream graphs Streams: Infinite sequence of data elements Actors: Functions applied to streams Stream Actor Stream 4

  5. Properties of Stream Program AtoD FMDemod • Regular and repeating computation • Independent actors with explicit communication • Producer / Consumer dependencies Splitter LPF1 LPF2 LPF3 HPF1 HPF2 HPF3 Joiner Adder Speaker 5

  6. StreamIt Language filter pipeline • An implementation of stream prog. • Hierarchical structure • Each construct has single input/output stream may be any StreamIt language construct splitjoin parallel computation splitter joiner feedback loop splitter joiner 6

  7. How to Estimate the Communication Overhead?

  8. Problems to Measure Communication Overhead • Reasons: • Multicores are non-communication exposed architecture • Complex cache hierarchy • Cache coherence protocols • Consequence: • Cannot directly measure the communication cost • Estimate the communication cost by measuring the execution time of actors

  9. Measuring the Communication Overhead of an Edge Processor 1 Processor 1 Processor 2 i k No communication cost With communication cost k i

  10. Processor 1 Processor 2 Processor 1 Processor 2 A A 1 B 1 2 B 2 C C 3 D 3 D 4 4 E E 5 F Even edges across partition Odd edges across partition How to Minimize the Required Number of Experiments Requires 2+1 Exps A 1 B Graph Coloring 2 C Pipeline

  11. Obs. 1: There is no loop of three actors in a stream graph Processor 1 Processor 2 l i k

  12. P-1 P-2 P-3 P-4 Obs. 2: There is no interference of adjacent nodes between edges A B C D E F For blue color edges

  13. Remove Interference • Convert to a line graph • Add interference edges • Use vertex coloring algorithm A AB AB BC BC B BD BD CE CE C D DE DE E EF EF F Line graph Stream graph

  14. Processor Leveling Graph A B A C D B, C, D, E E F F For blue colored edge Processor leveling graph

  15. Coloring the Processor Labelling Graph Processor 1 Processor 2 A A A B, C, D, E B, C, D, E B, C, D, E F F F

  16. Measuring the Communication Cost Processor 1 Processor 2 A A B B, C, D, E C D E F F For blue colored edge

  17. Profiling Performance

  18. Related Works [1] Static Scheduling of SDF Programs for DSP [Lee ‘87] [2] StreamIt: A language for streaming applications [Thies ‘02] [3] Phased Scheduling of Stream Programs [Thies ’03] [4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in Stream Programs [Thies ‘06] [5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08] [6] Software Pipelined Execution of Stream Programs on GPUs [Udupa‘09] [7] Synergistic Execution of Stream Programs on Multicores with Accelerators [Udupa ‘09] [8] Orchestration by approximation [Farhad ‘11] 18

  19. Questions?

More Related