Benchmarking Current UPC Platforms: Insights from the 4th PMEO-PDS Workshop

Presentation at the 4th PMEO-PDS Workshop Benchmark Measurements of Current UPC Platforms Zhang Zhang and Steve Seidel Michigan Technological University Denver, Colorado 3/22/2005

Presentation Outline • Background • Unified Parallel C, implementations and users. • Previous UPC performance studies. • Experiments • Available UPC platforms • Benchmarks • Performance measurements • Conclusions

UPC Overview • UPC is an extension of C for partitioned shared memory parallel programming. • A special case of shared memory programming model. • Similar languages: Co-Array Fortran, Titanium. • UPC homepage: http://www.upc.gwu.edu • Platforms supported: • Cray X1, Cray T3E, SGI Origin, HP AlphaServer, HP UX, Linux clusters, IBM SP. • UPC compilers: • Open source: MuPC, Berkeley UPC, Intrepid UPC • Commercial: HP UPC, Cray UPC • Users: • LBNL, IDA, AHPCRC, …

Related UPC Performance Studies • Performance benchmark suites • UPC_Bench (GWU) • Synthetic microbenchmark based on the STREAM benchmark. • Application benchmarks: Sobel edge detection, matrix multiplication, N-Queens problem • UPC NAS Parallel Benchmarks (GWU) • Performance monitoring • Performance analysis for HP UPC compiler (GWU) • Performance of Berkeley UPC on HP AlphaServer (Berkeley) • Performance of Intrepid UPC on SGI Origin (GWU)

Benchmarking UPC Systems • Extended shared memory bandwidth microbenchmarks to cover various reference patterns: • Scalar references: 11 access patterns • Block memory operations: 9 access patterns • Benchmarked six combinations of available UPC compilers and platforms using both the UPC STREAM (MTU code) and the UPC NAS Parallel Benchmarks (GWU code). • Compilers: MuPC, HP UPC, Berkeley UPC and Intrepid UPC • Platforms: Myrinet Linux cluster, HP AlphaServer SC, and T3E • The first comparison of performance for currently available UPC implementations. • The first report on MuPC performance.

Benchmarks • Synthetic benchmarks: • The STREAM microbenchmark was rewritten using UPC with more diversities of shared memory access patterns: • Local shared read / write • Unit stride shared read / write / copy • Random shared read / write / copy • Stride-n shared read / write / copy • Block transfers with variations of source and sink affinities. • NAS Parallel Benchmark Suite v2.4 • The UPC version was developed at GWU. • Five cores: CG, EP, FT, IS and MG. • Two variations: Naïve version and Hand-tuned version. • Input size: Class A workload.

Local Shared References • Intrepid UPC: performance is poor on local shared accesses. • HP UPC: cache state has significant effects on local shared accesses.

Remote Shared References • HP UPC and MuPC: caches help unit stride remote shared accesses. • Intrepid UPC does the best for remote shared accesses.

Block Memory Operations • HP UPC: performance is poor on certain string functions. • Intrepid UPC: low performance on all categories.

NPB – CG • The only case that scales well: Berkeley UPC + optimized code.

NPB – EP

NPB – FT • HP, Berkeley and MuPC: performance is comparable.

NPB – IS • HP, Berkeley and MuPC: performance is comparable.

NPB – MG • MG performance is very inconsistent.

Conclusions • STREAM benchmarking: • UPC language overhead reduces performance of local shared references. • Remote reference caching helps stride-1 accesses. • Copying between two locations with the same affinity to a remote thread needs optimization. • NPB benchmarking: • Some implementation failed for some benchmarks. More stable and reliable implementations are needed. • Hand-tuning techniques (e.g. prefetching) are critical in performance. • Berkeley UPC is the best at handling unstructured, fine-grained references. • MuPC experience shows that it will be more rewarding to optimize remote shared references than to improve network interconnects.

Thank you! For more information: http://www.upc.mtu.edu

Benchmarking Current UPC Platforms: Insights from the 4th PMEO-PDS Workshop

Benchmarking Current UPC Platforms: Insights from the 4th PMEO-PDS Workshop

Presentation Transcript

HAPL 39 th Annual Technical Workshop Presentation

TEAN Workshop 4 th March 2011

Welcome to the 4 th Workshop 19 th April 2014

4 th Grade Parent Writing Workshop

4 th presentation

4 th Quarter Current Events Presentation

4 th Annual Cattleman’s Workshop

PDS 4 Information Model Issues

May 4 th at 6pm

Presentation made at: The Stakeholder Workshop

4 th Annual SIGHCI Workshop Conclusion

4 th LHC Crab Cavity Workshop

4 th Year Interim Project Presentation

4 th Year Final Project Presentation

Mooring Activities at SAMS NERC 4 th Moorings Workshop 15 th /16 th May 2008

ACFE Presentation November 4 th , 2009

Norwegian presentation at the in-session workshop at AWG 2

4 th – 5 th Step Workshop

For presentation at the NDIA 39 th Annual

4 TH WORLD BANK ACE WORKSHOP