1 / 13

Parallel Performance Wizard: Introduction

Parallel Performance Wizard: Introduction. Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research Assistant Mr. Hans Sherburne, Research Assistant Mr. Max Billingsley, Research Assistant

lynn
Télécharger la présentation

Parallel Performance Wizard: Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Performance Wizard:Introduction Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research Assistant Mr. Hans Sherburne, Research Assistant Mr. Max Billingsley, Research Assistant Mr. Josh Hartman, Undergraduate Volunteer HCS Research Laboratory University of Florida

  2. Outline • Motivations and Objectives • Background • Framework & Key Features • Phase II Tasks & Schedules • Today’s Schedule

  3. Motivations and Objectives • Motivations • UPC/SHMEM program does not yield the expected performance. Why? • Due to complexity of parallel computing, difficult to determine without tools for performance analysis and optimization • Discouraging for users, new & old; few options for shared-memory computing in UPC and SHMEM communities • Objectives • Research topics relating to performance analysis • Develop framework for a performance analysis tool • Design with both performance and user productivity in mind • Develop a performance analysis tool for UPC and SHMEM

  4. Need for Performance Analysis • Performance analysis of sequential applications can be challenging • Performance analysis of explicitly communicating parallel applications is significantly more difficult • Mainly due to increase in number of processing nodes • Performance analysis of Implicitly communicating parallel applications is even more difficult • Non-blocking, one-sided communication is tricky to track and analyze accurately

  5. Background - SHMEM • SHared MEMory library • Based on SPMD model • Available for C / Fortran • Available for servers and clusters • Easier to program than MPI • Hybrid programming model • Traits of message passing • Explicit communication, replication and synchronization • Need to give remote data location (processing element ID) • Traits of shared memory • Provides logically shared memory system view • Non-blocking, one-sided communication • Lower latency, higher bandwidth communication • PSHMEM available for some implementations

  6. Background - UPC • Unified Parallel C (UPC) • Partitioned GAS parallel programming language • Common and familiar syntax and semantics for parallel C with simple extensions to ANSI C • Many implementations • Open source: Berkeley UPC, Michigan UPC, GCC-UPC • Proprietary: HP-UPC, Cray-UPC • Easier to program than MPI, software more scalable • With hand-tuning, UPC performance compares favorably with MPI

  7. Background – Performance Analysis • Three general performance analysis approaches • Analytical modeling • Mostly predictive methods • Could also be used in conjunction with experimental performance measurement • Pros: easy to use, fast, can be performed without running the program • Cons: usually not very accurate • Simulation • Pros: allow performance estimation of program with various system architectures • Cons: slow, not generally applicable for regular UPC/SHMEM users • Experimental performance measurement • Strategy used by most modern performance analysis tools (PATs) • Uses actual event measurement to perform analysis • Pros: most accurate • Cons: can be time-consuming PAT = Performance Analysis Tool

  8. Background - Experimental Performance Measurement Stages • Instrumentation – user-assisted or automatic insertion of instrumentation code • Measurement – actual measuring stage • Analysis – data analysis toward bottleneck detection & resolution • Presentation – display of analyzed data to user, deals directly with user • Optimization – process of finding and resolving bottlenecks

  9. Framework

  10. Key Features • Semi-automatic source-level instrumentation as default • Only P module and part of I module are visible to user • PAPI will be used • Tracing mode as default with profiling support • Post-mortem data processing and analysis • Analyses: load balancing, scalability, memory system • Visualizations: timeline display, speedup chart, call-tree graph, communication volume graph, memory access graph, profiling table

  11. Tasks & Schedule

  12. Discussion Topic: Target Platforms Our current platform list; changes needed? • Open • Quadrics SHMEM on Opterons + RHEL4 (qsnet) • Berkeley UPC on Opterons + RHEL4 (iba) • Proprietary • Cray UPC on X1E (src. inst) • Cray SHMEM on X1E

  13. Today’s Schedule 09:00 – 09:30 AM Project overview 09:30 – 10:15 AM Instrumentation (I) module presentation 10:15 – 10:30 AMBREAK 10:30 – 11:15 AM Measurement (M) module presentation 11:15 – 11:45 AMI&M-modules demo 11:45 – 13:00 PMLUNCH 13:00 – 13:45 PM Analysis (A) module presentation 13:45 – 14:00 PMA-module demo 14:00 – 14:45 PM Presentation (P) module presentation 14:45 – 15:00 PMP-module demo 15:00 – 15:15 PMBREAK 15:15 – 16:00 PM Wrap-up & planning discussion

More Related