240 likes | 352 Vues
This paper presents CPAR-Cluster, a runtime system designed for heterogeneous computing environments that utilize both mono and multiprocessor nodes. It addresses challenges like programmability, performance, and the management of shared variables. The CPAR language supports parallel programming through macrotasks and microtasks, while providing an efficient synchronization mechanism. The paper details test results from execution scenarios including matrix multiplication and the Travelling Salesman Problem (TSP), demonstrating the system’s effectiveness and flexibility in utilizing idle machines and SMP nodes.
E N D
CPAR-Cluster: A Runtime System for Heterogeneous Clusters with Mono and Multiprocessor Nodes Gisele S. Craveiro, PhD Profa. Liria M. Sato, PhD CCGrid 2004 - Chicago
Outline • Introduction • CPAR Parallel Programming Language • CPAR-Cluster • Tests and Results • Conclusions CCGrid 2004 DSM Workshop
Introduction • Commodity clusters: • Idle machines, SMP nodes • Heterogeneity, programmability and good performance • Hybrid Models • message passing model + shared memory model CCGrid 2004 DSM Workshop
Introduction • CPAR • Parallel programming language • Shared memory programming model • CPAR-Cluster • Runtime system • Transparent access to shared variables over heterogeneous clusters • Scheduling during execution time CCGrid 2004 DSM Workshop
CPAR Parallel Programming Language • Parallel Blocks • Macrotasks • Microtasks • Shared Variables (global and local scopes) • Synchronization Mechanisms CCGrid 2004 DSM Workshop
Parallel Block Macrotask Microtask CPAR Parallelism Grains Cluster Node Processor CCGrid 2004 DSM Workshop
CPAR-Cluster Runtime System • DSM implemented in the compiler/library level • Consistency on each shared variable • Eager release consistency model • Write update coherence protocol CCGrid 2004 DSM Workshop
CPAR-Cluster Runtime System • Update distribution criteria • Total : all nodes • Central (Master): only one node will receive. • Macrotask scheduling • Microtask scheduling (loop scheduling) • static • dynamic CCGrid 2004 DSM Workshop
CPAR-Cluster Execution Model Master Slave 1 Slave 2 Slave N CCGrid 2004 DSM Workshop
Execution Model - Master Node Executor Shared Variables Comm. Sender CCGrid 2004 DSM Workshop
Sender Comm. Task Queue Executor Execution Model - Slave Node CCGrid 2004 DSM Workshop
Input Files • Hardware platform configuration file • CPAR program file • User task assignment file (optional) CCGrid 2004 DSM Workshop
Nodes Configuration File #comment line #master node sun cpu=4 #slaves nodes moon cpu=4 onix cpu=4 leo taurus1 taurus2 taurus3 orion CCGrid 2004 DSM Workshop
Task Pre Scheduling File #nodes suggestion init_A onix, leo, moon; #architecture suggestion Calc_B SMP; #node imposition multiply onix!; #architecture imposition, node suggestion tsp SMP! onix; CCGrid 2004 DSM Workshop
Sequential (parent) Parallel Microtask (parent+children) Slave (parent) Sequential (parent) Slave (child 1) Slave (child 3) Slave (child 2) Macrotask & Microtask Execution CPAR Parallel Macrotask Execution & Synchronization Coordination task body hello(){ printf(“Only parent”); forall i=1 to 4{ printf(“Everybody”); } printf(“Again,parent”); } CCGrid 2004 DSM Workshop
Tests - Hardware Platform 1 Intel Pentium II quad node 16 Intel Celeron nodes 8 AMD Athlon dual nodes Fast Ethernet CCGrid 2004 DSM Workshop
Tests Performed • Matrix Multiply • Shared variables with global scope (total update strategy). • Shared variables with global scope(centralized update strategy). • Without shared variables (no update overhead). • Travelling Salesman Problem CCGrid 2004 DSM Workshop
Results – MM (size 2000) Execution Time (s) Nodes CCGrid 2004 DSM Workshop
Results – MM (size 2000) Execution Time (s) Nodes CCGrid 2004 DSM Workshop
Results – TSP 23 Cities Execution Time (s) Nodes CCGrid 2004 DSM Workshop
MM Omni+Score MM CPAR+CPAR-Cluster Execution Time (s) Nodes CCGrid 2004 DSM Workshop
Conclusions • CPAR-Cluster: • Tool implemented at library level, without kernel modifications or specific hardware. • Suitable behavior of shared variable update strategies • Data distribution criteria • Scheduling and load balancing • Exploration of computational power of mono and multiprocessor interconnected nodes CCGrid 2004 DSM Workshop
Questions? gisele.craveiro@poli.usp.br gisele.scraveiro@sp.senac.br CCGrid 2004 DSM Workshop