110 likes | 220 Vues
The HPC Challenge Benchmark Suite (HPCC) evaluates high-performance computing systems using a range of benchmarks focusing on memory hierarchy, effective bandwidth, and computational performance. It includes key benchmarks such as HPL (High-Performance LINPACK), STREAM, and FFT. The suite aims to improve the performance of real applications and ease programming challenges by measuring spatial and temporal locality. Results are made available online with various visualization options, supporting multiple programming languages.
E N D
The HPC Challenge (HPCC)Benchmark Suite Piotr Luszczek http://icl.cs.utk.edu/hpcc/
HPCC Components Ax=b ------------------------------------------------------- name kernel bytes/iter FLOPS/iter ------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2 ------------------------------------------------------- • HPL (High Performance LINPACK) • STREAM • PTRANS A ← AT+B • RandomAccess • FFT • Matrix-matrix multiply • b_eff (effective bandwidth/latency) 64 bits T[k] (+) ai T: +1 zk=xjexp(-2-1 jk/n) -1 C ← s*C + t * A*B ping pong
HPC Challenge Benchmarks Select Applications HPCC: Motivation and Measurement FFT High DGEMM HPL Generated by PMaC @ SDSC Temporal Locality Mission Partner Applications PTRANS RandomAccess STREAM Low Spatial Locality High Concept Measurement Spatial and temporal data locality here is for one node/processor - i.e., locally or “in the small”.
M M M M P P P P P P P P HPL STREAM FFT ... RandomAccess(1m) HPL(25%) system CPU thread G EP S Network Single M M M M P P P P P P P P OpenMP Vectorize Network CPU core(s) Softwaremodules Embarissingly Parallel Computationalresources M M M M P P P P P P P P MPI Memory Interconnect Network Global HPCC: Scope and Naming Conventions
HPCC: Hardware Probes HPC Challenge Benchmark Corresponding Memory Hierarchy HPCS Performance Targets (improvement) • Top500: solves a system Ax = b • STREAM: vector operations A = B + s x C • FFT: 1D Fast Fourier Transform Z = FFT(X) • RandomAccess: random updates T(i) = XOR( T(i), r ) Registers 2 Petaflops (8x) Instr. Operands 6.5 Petabyte/s (40x) Cache Blocks 0.5 Petaflops (200x) Local Memory bandwidth latency Messages 64,000 GUPS (2000x) Remote Memory Pages Disk • HPCS program has developed a new suite of benchmarks (HPC Challenge) • Each benchmark focuses on a different part of the memory hierarchy • HPCS program performance targets will flatten the memory hierarchy, improve real application performance, and make programming easier
HPCC: Official Submission Process • Download • Install • Run • Upload results • Confirm via @email@ • Tune • Run • Upload results • Confirm via @email@ Prequesites: • C compiler • BLAS • MPI Provide detailed installation and execution environment • Only some routines can be replaced • Data layout needs to be preserved • Multiple languages can be used Results are immediately available on the web site: • Interactive HTML • XML • MS Excel • Kiviat charts (radar plots) Optional
HPCC Submissions over Time Sum Sum HPL [Tflop/s] FFT [Gflop/s] #1 #1 STREAM [GB/s] Sum Sum RandomAccess [GUPS] #1 #1
HPCC: Comparing 3 Interconnects Kiviat chart (radar plot) • 3 AMD Opteron clusters • Clock: 2.2 GHz • 64-processor cluster • Interconnect types • Vendor • Commodity • GigE • G-HPL • Matrix-matrix multiply • Cannot be differentiated based on: • G-HPL • Matrix-matrix multiply • Available on HPCC website • http://icl.cs.utk.edu/hpcc/
HPCS ~102 HPC ~104 Clusters ~106 HPCC: Sample Results’ Analysis Peta • All results in words/second • Highlights memory hierarchy • Clusters • Hierarchy steepens • HPC systems • Hierarchy constant • HPCS Goals • Hierarchy flattens • Easier to program Tera Effective Bandwidth (words/second) Giga Mega Kilo Systems (in Top500 order)
TOP500 rating Data provided by HPCC database HPCC Augmenting June’07 TOP500
Contacts Piotr Luszczek http://icl.cs.utk.edu/hpcc/