UPC/SHMEM Language Analysis and Usability Study

PAT UPC/SHMEM Language Analysis and Usability Study Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research Assistant Mr. Hans Sherburne, Research Assistant HCS Research Laboratory University of Florida

Language Study

Purpose and Method • Purpose • Determine performance factors purely from language’s perspective • Gain insight into how to best incorporate performance measurement with various implementations • Method • Create a complete and minimal factor list • Analyze UPC and SHMEM (Quadrics) specs • Analyze various UPC/SHMEM implementations + discussions with developers

Factor List Creation • Factor list developed based on observations from other (tool, analytical model, etc.) studies • Ensures factors are measurable • Provides insight into how they can be measured • Only basic events included to eliminate redundancy • Sufficient for time-based analysis and memory system analysis • Completion notification – Calling thread waiting for completion of a one-sided operation initiated by calling thread • Synchronization – multiple threads waiting for each other to complete a single task • Local access – refers only to access of local shared (global) variable

Factor List

SHMEM Analysis • Performed on Quadrics SHMEM specification and GPSHMEM library • Great similarity between implementations • Factors for each construct involves execution + • Small transfer (put/get) • Synchronization (other) • Variations between implementations troublesome • A standard for SHMEM/GPSHMEM function set is desirable • General: provides user with a uniform library set • PAT: reduces complexity of system (i.e. possibly only one wrapper library is sufficient) • Wrapper approach (ex: PSHMEM) fits very well • Can borrow many ideas from PMPI • However, analysis of data transfers needs special care to handle one-sided communication • See Language Analysis sub-report for construct-factor assignments

UPC Analysis (1) • Performed on UPC spec. 1.1, Berkeley UPC, Michigan Tech UPC, and HP UPC (in progress) • See Language Analysis sub-report for construct-factor assignment • Specification analysis • Educated guesses, attempts to cover all aspect of language • Too generic for PAT development • Implementations • Many similarities between implementations • Wrapper mentality works with UPC function constructs  PUPC proposal • Pre-processor needed to handle UPC non-function constructs

UPC Analysis (2) • Implementations (cont.) • HP-UPC • Composed of UPC Compiler (compiler), Run-Time System (RTS), and (optional) Run-Time Environment (RTE) • UPC global variable access translates to HW shared-memory access  impacts time of instrumentation • Waiting for Brian at HP to send details on UPC functions to complete construct-factor assignment • GCC-UPC: will be studied after completion of HP UPC

UPC Specification Construct-Factor Table (1)

UPC Specification Construct-Factor Table (2)

Berkeley UPC Analysis (1) • Based on version 2.0.1 • Analysis at UPC level with some consideration at communication level • Noteworthy implementation details • upc_all_alloc and upc_all_lock_alloc: use of all-to-all broadcast • Upc_alloc and upc_global_alloc behave like upc_local_alloc: double size of heap when running out of space • Multiple mechanisms for implementing barrier • HW supported (Ex: InfiniBand) • Custom barrier (Ex: SHMEM/lapi) • Centralized (Other, current)  logarithmic dissemination (other, future) • Impact on PAT • UPC level only instrumentation  1 unit, less accurate • UPC + communication level instrumentation  multiple units, more accurate

Berkeley UPC Analysis (2) • Noteworthy implementation details (cont.) • Three different translations for upc_forall • All tasks can be done by 1 thread  if statement followed by a regular for loop • Tasks are cyclic distributed  for loop with stride factor equal to number of threads • Tasks are block distributed  two-level for loops are used (outer level is same as in second case and inner loop is a regular for loop corresponding to all elements in block) • Impact on PAT – instrumentation needed before translation

Berkeley UPC Construct-Factor Table (1)

Berkeley UPC Construct-Factor Table (2)

Michigan Tech UPC Analysis • Based on version 1.1 • Noteworthy implementation details • Uses a centralized control for most control processes (i.e. split and non-split barriers, collective array allocation, collective lock allocation and global exit.) • Based on two pthreads system using consumer-producer mechanism. • Program thread (producer): adds entries to appropriate send queues • Communication thread (consumer): sending and processing requests via MPI (no aggregation of data for optimization, bulk transfer = x small transfers) • Impact on PAT – transfer, completion and synchronization is much harder to track • Uses flat broadcast and tree broadcast • Caching capability complicates analysis

MTU UPC Construct-Factor Table (1)

MTU UPC Construct-Factor Table (2)

Summary • Factor list and construct-factor assignment provide basis for practical event tracing in UPC and SHMEM • SHMEM • Wrapper library approach appears ideal • Push for SHMEM standardization will simplify development • UPC • Hybrid pre-processor/wrapper library approach appears appropriate (compatible with GCC-UPC?) • Analysis provides insights on how to instrument UPC/SHMEM programs and raises awareness to possible difficulties

Usability Study

Usability: Purpose and Methods • Purpose • Determine factors affecting usability of performance tools • Determine how to incorporate knowledge about factors into PAT • Methods • Elicit user feedback through a Performance Tool Usability Survey (survey generated after some literature reviews) • Review and provide a concise summary of literature in area of usability for parallel performance tools • Outline • Discuss common problems seen in performance tools • Provide a discussion on factors influencing usability of performance tools • Outline how to incorporate user-centered design into PAT • Present guidelines to avoid common usability problems

Performance Tool Usability: • General Performance Tool Problems • Difficult problem for tool developer • Inherently unstable execution environment • Monitoring behavior may disturb original behavior • Short lifetime of parallel computers • Users • Tools too difficult to use • Too complex • Unsuitable for real-world applications • Users skeptical about value of tools

Discussion on Usability Factors* (1) * C. Pancake, ‘‘Improving the Usability of Numerical Software through User-Centered Design,’’ The Quality of Numerical Software: Assessment and Enhancement, ed. B. Ford and J. Rice, pp. 44-60, 1997. • Ease-of-learning • Concern • Important for attracting new users • Tool’s interface shapes user’s understanding of its functionality • Inconsistency leads to confusion (e.g. providing defaults for some object but not all) • Possible solutions • Strive for internally and externally consistent tool • Stick to established conventions • Provide uniform interface • Target as many platforms as necessary so user can amortize time invested over many uses • Usefulness • Concern: How directly tool helps user achieve their goal • Possible Solution: Make common case simple even if that makes rare case complex

Discussion on Usability Factors (2) • Ease-of-use • Concern: Amount of effort required to accomplish work with tool too high to justify tool’s use • Possible solutions • Do not force user to memorize information about interface – use menus, mnemonics, and other mechanisms • Provide a simple interface • Make all user-required actions concrete and logical • Throughput • Concern: How does tool contribute to user productivity in general • Keep in mind that inherent goal of tool is to increase user productivity

User-Centered Design • Concept that usability should be driving factor in tool development • Based on premise that usability will only be achieved if design process is user-driven • Four-step model to incorporate user feedback* (chronological) • Ensure initial functionality is based on user needs • Solicit input directly from users • MPI users (for information about existing tools) • UPC/SHMEM users • Sponsor • Analyze how users identify and correct performance problems • UPC/SHMEM users primarily • Gain better idea of how the tool will actually be used on real programs • Information from users then presented to sponsor for critique/feedback • Implement Incrementally • Organize interface so that most useful features are best supported • User evaluation of preliminary/prototype designs • Maintain strong relationship with users with whom we have access • Have users evaluate every aspect of tool’s interface, structure, and behavior • Alpha/Beta testing • User tests should be performed at many points along the way • Feature-by-feature refinement in response to specific user feedback * S. Musil, G. Pigel, M. Tscheligi. “User Centered Monitoring of Parallel Programs with InHouse.” HCI ’94 Ancillary Proceedings, 1994.

Performance Tool Usability: Guidelines • Issues for Performance Tools and Solutions • Many tools begin by presenting windows with detailed info on a performance metric • Users prefer broader perspective on application behavior • Some tools provide multiple views of program behavior • Good idea, but need support for comparing different metrics • For example, if CPU utilization drops in same place, L1 cache miss rate rises • Also essential to provide source-code correlation to be useful • User does not want info that cannot be used to fix code

Performance Tool Usability: Summary • Summary • Tool will not gain user acceptance until useable in real-world environment • Need to identify successful user strategies from existing tools for real applications • Devise ways to apply successful strategies to tool in an intuitive manner • Use this functionality in development of new tool

Presentation Methodology: Introduction • Why use visualizations? • To facilitate user comprehension • To convey complexity and intricacy of performance data • Help bridge gap between raw performance data and performance improvements • When to use visualizations? • On-line: visualization while application is running (can slow down execution significantly) • Post mortem: after execution (usually based on trace data gathered at runtime) • What to visualize? • Interactive displays to guide the user • Default visualizations should provide high-level views • Low-level information should be easily accessible

General Approaches to Performance Visualization • General Categories • System/Application-independent: depict performance data for variety of systems and applications – most tools use this approach • Meta-tools: facilitate development of custom visualization tools • Other Categories • On-line: visualization during execution • Can be intrusive • Volume of information may be too large to interpret without playback functionality • Allows user to observe only interesting parts of execution without waiting • Post mortem: visualization after execution • Have to wait to see visualizations • Easier to implement • Less intrusion on application behavior

Useful Visualizations Techniques • Animation • Has been employed by various tools to provide program execution replay • Most commonly animated events are communication operations • Viewing data dynamically may illuminate bottlenecks more efficiently • However, animation usually very cumbersome in practice • Program graphs • Generalized picture of entire system • Gantt charts • De facto standard for displaying inter-process communication • Data access displays • Each cell of 2D display is devoted to an element of the array • Color distinguishes between local/remote and read/write • Critical path analysis • Concerned with identifying program regions which most contribute to program execution time • Graph depicts synchronization and communication dependencies among processes in program

Summary of Visualizations

Guidelines and Interface Evaluation • General Guidelines* • Visualization should guide, not rationalize • Scalability is crucial • Color should inform, not entertain • Visualization should be interactive • Visualizations should provide meaningful labels • Default visualization should provide useful information • Avoid showing too much detail • Visualization controls should be simple • Goals, Operators, Methods, and Selection Rules (GOMS) • Formal user interface evaluation technique • Way to characterize a set of design decisions from point of view of user • Description of what user must learn; may be basis for reference documentation • May be able to use GOMS analysis in design of PAT • Knowledge described in a form that can actually be executed (there have been several fairly successful attempts to implement GOMS analysis in software, e.g. GLEAN) • Various incarnations of GOMS with different assumptions useful for more specific analyses (KVL, CMN-GOMS, NGOMSL, CPM-GOMS, etc.) * B. Miller. “What to Draw? When to Draw?: an essay on parallel program visualization,” Journal of Parallel and Distributed Computing, 18:2, pp. 265-269, 1993.

Simple GOMS Example: OS X • GOMS model for OS X • Method for goal: delete a file • Step 1. Think of file name and retain as first filespec (file specifier) • Step 2. Accomplish goal: drag file to trash • Step 3. Return with goal accomplished • Method for goal: move a file • Step 1. Think of file name and retain as first filespec • Step 2. Think of destination directory name and retain as second filespec • Step 3. Accomplish goal: drag file to destination • Step 4. Return with goal accomplished

Simple GOMS Example: UNIX • GOMS model for UNIX • Method for goal: delete a file • Step 1. Recall that command verb is rm -f • Step 2. Think of file name and retain as first filespec • Step 3. Accomplish goal: enter and execute a command • Step 4. Return with goal accomplished • Method for goal: copy a file • Step 1. Recall that command verb is cp • Step 2. Think of file name and retain as first filespec • Step 3. Think of destination directory name and retain as second filespec • Step 4. Accomplish goal: enter and execute a command • Step 5. Return with goal accomplished

Summary • Plan for development • Develop a preliminary interface that provides functionality required by user while conforming to visualization guidelines • After preliminary design is complete, elicit user feedback • During periods where user contact is unavailable, may be able to use GOMS analysis or another formal interface evaluation technique

UPC/SHMEM Language Analysis and Usability Study