1 / 34

A Performance Estimator for Parallel Hierarchical Memory Systems -- PetaSIM

A Performance Estimator for Parallel Hierarchical Memory Systems -- PetaSIM. Yuhong Wen and Geoffrey C. Fox Northeast Parallel Architecture Center (NPAC) Syracuse University {wen,gcf}@npac.syr.edu. Outlines. Performance Estimation Process PetaSIM Motivation and Idea

joesphq
Télécharger la présentation

A Performance Estimator for Parallel Hierarchical Memory Systems -- PetaSIM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Performance Estimator for Parallel Hierarchical Memory Systems -- PetaSIM Yuhong Wen and Geoffrey C. Fox Northeast Parallel Architecture Center (NPAC) Syracuse University {wen,gcf}@npac.syr.edu PDCS'99 Fort Lauderdale, Florida

  2. Outlines • Performance Estimation Process • PetaSIM Motivation and Idea • Help to design computer architecture and applications • Java applet friendly user interface • Performance Specification Language Features • Design and Implementation of PetaSIM • Experiment Results PDCS'99 Fort Lauderdale, Florida

  3. Why Performance Prediction? • Application complexities • A lot of processors required • Large amount of data involved • Time-consuming processing • Performance prediction to fasten • new parallel computer architecture design • application model design PDCS'99 Fort Lauderdale, Florida

  4. Performance Prediction Approaches • Concept design level performance prediction • aim to provide a quick and roughly correct performance prediction at the early stage of model design • PetaSIM based on this level • Detailed performance prediction • to provide detailed information of a given application running on specific computer system -- Simulation PDCS'99 Fort Lauderdale, Florida

  5. PetaSIM Motivation • PetaSIM was designed to allow “qualitative” performance estimates” where in particular the design of machine is particularly easy to change • Applications are to be derived “by hand” or by automatic generation from Maryland Application Emulators • Special attention to support of hierarchical memory machines and data intensive applications • Support simulation of “pure data-parallel” and composition of linked modules PDCS'99 Fort Lauderdale, Florida

  6. Peta-Computing Hierarchy Full Heterogeneous MetaProblem Module Module Module Components Components Module Module Task Parallelism Aggregate Aggregate Data Parallelism Simulate Loosely Synchronize Computation Splitting into Lower Level Memory Hierarchy PetaSIM Real Computing PDCS'99 Fort Lauderdale, Florida

  7. Performance Prediction Model Application Domain Software / Operating System Domain Hardware Domain Multi Domain model PDCS'99 Fort Lauderdale, Florida

  8. Three Domain Performance Prediction • Application Domain • to extract the data aggregates • to give abstract data movement and computation behavior • Software / Operating System Domain • to provide the methods for task process and memory management, communication and parallel file access • Hardware Domain • to provide the model of processor and memory components, includes cache as well PDCS'99 Fort Lauderdale, Florida

  9. Performance Specification Language Because of the complexities of performance prediction • Various different kinds of applications • Different kinds of parallel architectures It’s very important to design a general performance specification language (PSL) to represent all the features of the different aspects in the performance process. PetaSIM shows an initial step to suggest that characteristics of such a Performance Specification Language (PSL). PDCS'99 Fort Lauderdale, Florida

  10. Performance Specification Language Application Domain • The size of each data block • the number of data blocks • the amount of data operations in the data block • data distribution model • data processing sequence / flow of the data blocks -- the application algorithm PDCS'99 Fort Lauderdale, Florida

  11. Performance Specification Language • The memory management approach • the cache management approach • parallel task schedule method • parallel file access pattern • computing, communication overlap approach Software / Operating System Domain PDCS'99 Fort Lauderdale, Florida

  12. Performance Specification Language Hardware Domain • Computing capability of each processor, which include the CPU speed and the bandwidth • memory size and cache size • architecture of each processing node • inter-communication topology of the parallel machines, which is to provide the information of communication between the processors PDCS'99 Fort Lauderdale, Florida

  13. Petasim Estimator & Emulator Applications Hand Code Applications UMD Emulators Execution Script Dataset Distribution Nodeset Linkset PetaSIM Performance Estimation PDCS'99 Fort Lauderdale, Florida

  14. Emulators • Extract the application’s computational and data access patterns • A simplified version of the real application, contains all the necessary communication, computation and I/O characteristics • less accurate than full application, but more robust • fast performance prediction for rapid prototyping PDCS'99 Fort Lauderdale, Florida

  15. PetaSIM Design • We define an object structure for computer (including network) and data • Architecture Description • nodeset & linkset • (describe the architecture memory hierarchy) • Data Description • dataset & distribution • Application Description • execution script • System / Software Description PDCS'99 Fort Lauderdale, Florida

  16. Architecture Description • A nodeset is a collection of entities with types liked • memory with cache & disks • CPU where results can be calculated • pathway such as bus, switch or network • A linkset connects nodesets together in various ways PDCS'99 Fort Lauderdale, Florida

  17. Application Description • An application consists of dataset objects • dataset implementation is controled by the distribution objects • application behavior is represented by execution script • a set of command statements • data movements • data computation • synchronization PDCS'99 Fort Lauderdale, Florida

  18. Nodeset Object Structure • Name: one per nodeset object • type: choose from memory, cache, disk, CPU, pathway • number: number of members of this nodeset in the architecture • grainsize: size in bytes of each member of this nodeset (for memory, cache, disk) • bandwidth: maximum bandwidth allowed in any one member of this nodeset • floatspeed: CPU’s float calculating speed • calculate(): method used by CPU nodeset to perform computation • cacherule: controls persistence of data in a memory or cache • portcount: number of ports on each member of nodeset • portname[]: ports connected to linkset • portlink[]: name of linkset connecting to this port • nodeset_member_list: list of nodeset members in this nodeset (for nodeset member identification) PDCS'99 Fort Lauderdale, Florida

  19. Linkset Object Structure • Name: one per linkset object • type: choose from updown, across • nodesetbegin: name of initial nodeset joined by this linkset • nodesetend: name of final nodeset joined buy this linkset • topology: used for across networks to specify linkage between members of a single nodeset • duplex: choose from full or half • number: number of members of this linkset in the architecture • latency: time to send zero length message across any member of linkset • bandwidth: maximum bandwidth allowed in any link of this linkset • send(): method that calculates cost of sending a message across the linkset • distribution: name of geometric distribution controlling this linkset • linkset_member_list: list of linkset members in this linkset ( for linkset member identification ) PDCS'99 Fort Lauderdale, Florida

  20. Dataset Object Structure • Name: one per dataset object • choose from grid1dim, grid2dim, grid3dim, specifies type of dataset • bytesperunit: number of bytes in each unit • floatsperunit: update cost as a floating point arithmetic count • operationsperunit: operations in each unit • update(): method that updates given dataset which is contained in a CPU nodeset and a grainsize controlled by last memory nodeset visited • transmit(): method that calculates cost of transmission of dataset between memory levels either communication or movement up and down hierarchy • Methods can use other parameters or be custom PDCS'99 Fort Lauderdale, Florida

  21. Execution Script • Currently a few primitives which stress (unlike most languages) movement of data through memory hierarchies • send DATAFAMILY from MEM-LEVEL-L to MEM-LEVEL-K • These reference object names for data and memory nodesets • move DATAFAMILY from MEM-LEVEL-L to MEM-LEVEL-K • Use distribution DISTRIBUTION from MEM-LEVEL-L to MEM-LEVEL-K • compute DATAFAMILY-A, DATAFAMILY-B,… on MEM-LEVEL-L • synchronize (synchronizes all processors --- loosely synchronous barrier) • loop operation PDCS'99 Fort Lauderdale, Florida

  22. PetaSIM Estimation Schedule • Each nodeset member has its usage control to record when dataset arrives and when to send out to next nodeset member • Each linkset member has its usage control to record at what time the linkset member is free or occupied • Data Driven model: Ready ---> Go (First come, First Service) • Support Both data parallel mode and individual operation on each nodeset, linkset member mode PDCS'99 Fort Lauderdale, Florida

  23. Architecture of PetaSIM C++ Simulator Multi-User Java Server StandardJava AppletClient StandardJava AppletClient PDCS'99 Fort Lauderdale, Florida

  24. PetaSIM Experiments Typical SP2 configuration of nodeset and linkset components PDCS'99 Fort Lauderdale, Florida

  25. Pathfinder Performance Estimation Results PDCS'99 Fort Lauderdale, Florida

  26. Pathfinder Estimation Results II PDCS'99 Fort Lauderdale, Florida

  27. Titan Estimation Results (Fixed) PDCS'99 Fort Lauderdale, Florida

  28. VMScope Estimation Results PDCS'99 Fort Lauderdale, Florida

  29. PetaSIM Features • Accurate estimation • Friendly user interface • Easy to modify the architecture design • Easy to monitor the effect of the design change • Fast Estimation • Detail performance estimation • Provide detail usage of each individual nodeset and linkset member in the memory hierarchy PDCS'99 Fort Lauderdale, Florida

  30. Compare with some other Simulators • Different Simulation Approach • PetaSIM not real run the application, estimate the execution script (operation abstraction) • PetaSIM running on single processor • Similar performance estimation results • PetaSIM can easily deal with different kinds of computer architecture • PetaSIM can get detailed information of any part of the architecture PDCS'99 Fort Lauderdale, Florida

  31. PetaSIM Current Progress Summary • Architecture Description (nodeset & linkset) • Application Description (dataset & execution script) • Link to Application Emulators • Jacobi hand-written example • Pathfinder, Titan, VMScope real applications (Generated by UMD’s Emulator) • Easy modified Architecture and Application description • Fast and relatively Accurate performance estimation (PetaSIM running on single processor) • Java applet based user Interface • Data Parallel Model & Individual Control PDCS'99 Fort Lauderdale, Florida

  32. Possible Future Work • Richer set of applications using standard benchmarks and DoD MSTAR • Relate object model to those used in “seamless interfaces” / metacomputing i.e. to efforts to establish (distributed) object model for computation • Review very simple execution script -- should we add more complex primitives or regard “application emulators” as this complex script • Binary format (“compiled PetaSIM”) of architecture and application description ( ASCII format will make execution script very large) • Translation tool from ASCII format to binary format (to retain the friendly user interface) • Upgrade performance evaluation model • Run performance simulation in parallel (i.e. PetaSIM running on multi-processors) PDCS'99 Fort Lauderdale, Florida

  33. PetaSIM Web-Site URL http://kopernik.npac.syr.edu:4096/petasim/V1.0/PetaSIM.html -- PetaSIM Java Applet front user interface and demo -- Related PetaSIM documents PDCS'99 Fort Lauderdale, Florida

  34. Interface of PetaSIM Client PDCS'99 Fort Lauderdale, Florida

More Related