280 likes | 413 Vues
This document outlines extensive research experiences on computer performance optimization, highlighting the critical need for high performance in computing. It discusses methodologies for parallel performance from the University of Michigan, including performance characterization and optimization techniques applied in various commercial applications. Key areas include database, network stack, and web server optimization, alongside insights into parallel computing challenges and advancements. It emphasizes the importance of load balancing, scheduling, and communication minimization in optimizing parallel applications.
E N D
My Research Experiences onComputer Performance Optimization Shih-Hao Hung, Ph.D. Sun Microsystems Inc. Confidential Information – Do Not Forward
Computer Performance • Demands for high performance: • Getting jobs done faster • Getting more jobs done at the same time • Getting complex jobs done in time • Price-performance trade-offs: • Getting jobs done efficiently • Getting jobs done with limited resources • Capacity planning Confidential - Do Not Forward
Performance Optimization Confidential - Do Not Forward
My Research Background • High-Performance/Technical ComputingParallel Performance Project University of Michigan, 1993-2000 • Parallelization for high-performance applications • Performance characterization tools • Performance optimization methodologies • Commercial Applications OptimizationPerformance and Availability EngineeringSun Microsystems, Inc., 2000-present • Database server optimization • Network stack optimization • Webserver optimization • Security Infrastructure optimization Confidential - Do Not Forward
Parallel Performance Project • Started by Prof. Edward S. Davidson of U of Michigan in 1990, funded by NSF, Ford Motor Co., UM Center of Parallel Computing, IBM, DoD, etc. • Produced 11 Ph.D.’s in 10 years. • Work on state-of-the-art parallel supercomputers and realistic applications • Covers many aspects of computer architecture, from CPU pipelines to clustered systems. • Optimization by all means: instruction scheduling, memory locality, parallelization, etc. via compiler techniques and hand-tuning. Confidential - Do Not Forward
Parallel Computing • Very hot in the 90’s: • People rushed to build large MPP’s. • Looks good in theory, but lack of practical tools and experiences. • Most existing apps are difficult to parallelize. • Failed to race with Moore’s Law. R&D cycle too expensive and too long to catch up with increase of CPU Mhz-Ghz. • Looking ahead: • Throughput computing and commercial workload drive MP. • Chip density and area favors SMT & CMP designs. • Struggling to find ways to keep the same growth of Ghz. • Multiple-core processors, multiple-processors systems are becoming the norm in the coming years. Confidential - Do Not Forward
Optimizing Parallel Applications • Very complex, difficult problems: • Program parallelization • Load balance • Scheduling • Minimize interprocessor communications • Architecture-dependent optimization • Today: • Still lots of open problems. • Parallelizing compilers are far from automatic solutions. • Tomorrow: • Further research and practical solutions will be in high demand as MP systems becomes popular at all levels. Confidential - Do Not Forward
Hierarchical Performance Bounds Confidential - Do Not Forward
Example: FCRASH • Vehicle crash simulation at Ford. • Finite-element code contains over 10,000 Fortran lines and 14 parallel loops. • Profiled on a NUMA system (HP/Convex SPP-1000). • P-gap: imperfect parallelization • C’-gap: inter-cluster communications • L & M’-gaps: Load balancing issues Confidential - Do Not Forward
Goal-directed Optimization Confidential - Do Not Forward
Performance Tuning Confidential - Do Not Forward
Modeling a Parallel Application Confidential - Do Not Forward
Model-Driven Simulation Confidential - Do Not Forward
Performance Tuning Results • SP - initial parallel version • SD - changing domain decomposition to reduce load imbalance (L-gap) and communications (C’-gap) • SD2 - SD + array padding to reduce false-sharing communications (Unmodeled-gap) • SD3 - SD2 + eliminating thread migration to reduce communications (Unmodeled-gap) • SD4 - SD3 + eliminating unnecessary synchronization barriers (S’-gap) Confidential - Do Not Forward
Sun Microsystems • Proud of visions and innovative technologies. • Face fierce competitions in the server business • OS: Microsoft, Linux • CPU: Intel, IBM • High-end market: IBM, HP • Low-end market: Dell and other x86 vendors • Still going for the next big thing • Network computing (Java, JDS, JES, GridEngine) • Throughput computing (Niagara 1 & 2, Rock) • Solaris 10 & x86 support Confidential - Do Not Forward
Performance Engineering • Performance problems everywhere… • Deal with important commercial applications: • Database • Network infrastructure & applications • Throughput computing • Security Infrastructure • Solve problems by: • Identifying issues • Improving products • Influencing future development Confidential - Do Not Forward
Networking Infrastructure • Gigabit Ethernet driver optimization • TCP/IP stack optimization • Multi-data transmission and Jumbo Frames • TCP Offloading Engine (TOE) • Infiniband vs 10GE • On-chip high-speed Ethernet support Confidential - Do Not Forward
Networking Applications • Optimizing SunOne servers • Webserver • Directory server • Application server • Portal server • Tweaking benchmarks • SPECweb99 & 2004 • SPECweb99_SSL • TPC-W (W = Web commerce) Confidential - Do Not Forward
Security Infrastructure • Crypto accelerators • On-chip crypto support • Secure Socket Layer (SSL) & HTTPS acceleration • IPsec & VPN acceleration • Crypto optimization • Solaris Cryptographic Framework Confidential - Do Not Forward
Crypto Acceleration Confidential - Do Not Forward
http http tcp http sha1 http sha1 rc4 http tcp sha1 rc4 rsa http tcp sha1 rc4 rsa_reuse HTTP/SSL Performance • HTTP, 100% Keep Alive • HTTP, 0% Keep Alive • HTTPS, 100% Keep Alive, no encryption, SHA1 hashing • HTTPS, 100% Keep Alive, RC4 encryption, SHA1hashing • HTTPS, 0% Keep Alive, 100% session creation (RSA), RC4, SHA1 • HTTPS, 0% Keep Alive, 100% session resumption (RSA-reuse), RC4, SHA1 Confidential - Do Not Forward
IPsec Performance Confidential - Do Not Forward
Solaris Cryptographic Framework Confidential - Do Not Forward
Throughput Computing - Niagara Confidential - Do Not Forward
Niagara-2 4-Core Server Competition – Nov. 2007 Confidential - Do Not Forward
Rock Confidential - Do Not Forward
Conclusion • Will see radical changes in computer systems in the near future, and system-wide hardware-software co-optimization is key to unleash their potentials. • High density chips • Multi-core CPUs • Highly scalable systems • Enormous network & I/O capacity • Built-in security support • Performance is an expertise that is best acquired from experiences. • Methodology and collaboration are our formulas for success. Confidential - Do Not Forward