300 likes | 413 Vues
This executive briefing explores the emergence of multicore processors and the impact on SaaS applications. Key topics include the challenges developers face, such as data races and application performance optimization. Discover how to effectively leverage multicore technology to improve response times, enhance throughput, and support demanding applications in finance and engineering simulations. Gain insights into debugging, testing, and parallel programming talent retention to successfully navigate the multicore landscape.
E N D
Cilk++, Cilk, Cilkscreen, and Cilk Artsare trademarks of Cilk Arts, Inc. Executive Briefing:Multicore-Enabling SaaSApplications September 3, 2008 www.cilk.com
Agenda • Emergence of multicore processors • Key challenges facing developers • When can multicore help? • Data races: a new type of bug • Questions to ask when going multicore • Programming tools & techniques
About CILKARTS Mission: To provide the easiest, quickest, and most reliable way to optimize application performance on multicore processors. • Launched in March 2007. • Headquartered in Burlington, MA. • Funded by Stata Venture Partners, software industry executives, founders, and grants from the NSF and DARPA. • First product is Cilk++, based on 15 years of research at MIT
Moore’s Law Transistor count is still rising, … Intel CPU Introductions but clock speed is bounded at ~5GHz. Source: Herb Sutter, “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software,” Dr. Dobb's Journal, 30(3), March 2005.
Power Density Source: Patrick Gelsinger, Intel Developer’s Forum, Intel Corporation, 2004.
Vendor Solution Intel 45nm quad-core processor • To scale performance, put many processor cores on a chip. • Intel predicts 80+ cores by 2011!
SaaS Opportunity • Increase throughput • Quantitative finance: increase volume of portfolios analyzed overnight • Reduce response time • Engineering simulation: accelerate structural analysis of assembly • Improve user experience • Multiplayer games: increased galaxy size • Reduce data center power consumption
User Work User Work Computer Operation 2 Computer Operation 1 Multicore and SaaS • Application response time? • Processor utilization? P1 P2 P3 P4 P5 P6 P7 P8
User Work User Work User Work User Work User Work User Work User Work User Work Computer Operation 2 Computer Operation 2 Computer Operation 2 Computer Operation 2 Computer Operation 1 Computer Operation 1 Computer Operation 1 Computer Operation 1 Multicore and SaaS • For CPU-constrained applications, multi-threading improves response time and boosts utilization Computer Operation #1 Computer Operation #2 P1 User Work User Work P2 P3 P4 P5 P6 P7 P8
Multicore Challenges Application Performance • How can you minimize response time? • Will your solution scale as the number of processor cores increases? • Can you identify performance bottlenecks? Development Time • How will you get your product out in time? • Where will you find enough parallel-programming talent? • Will you be forced to redesign your application? Software Reliability • Can you debug your parallel application? • How will you test it effectively before release?
Work & Span • Work: total amount of time spent in all the instructions • Span: Critical path • Parallelism: ratio of work to span 1 2
Work & Span • Work: total amount of time spent in all the instructions • Span: Critical path • Parallelism: ratio of work to span • In this example: • Work = 18 • Span = 9 • Parallelism = 2 • i.e., little gain beyond 2 processors 1 2 3 4 6 13 7 9 14 16 5 8 10 17 11 15 12 18
Can Multicore Help? • The more parallelism is available in an application, the more a multicore processor can help. Work:T1 = 58 Span: T∞ = 9 (same as previous example) Parallelism: T1/T∞ = 6.44
Race Bugs Definition.A determinacy race occurs when two logically parallel instructions access the same memory location and at least one of the instructions performs a write. A int x = 0; x++; x++; B C 1 x = 0; assert(x == 2); 2 4 r1 = x; r2 = x; D 3 5 r1++; r2++; 7 6 x = r1; x = r2; 8 assert(x == 2);
Coping with Race Bugs • Although locking can “solve” race bugs, lock contention can destroy all parallelism. • Making local copies of the nonlocal variables can remove contention, but at the cost of restructuring program logic. • Cilk++ provideshyperobjects to mitigate data races on nonlocal variables without the need for locks or code restructuring. IDEA:Different parallel branches may see differentviewsof the hyperobject.
20 Questions to Ask http://www.cilk.com/resource-library/going-multicore-20-questions-to-ask/
Development Time • To multicore-enable my application, how much logical restructuring of my application must I do? • Can I easily train programmers to use the multicore software platform? • Can I maintain just one code base, or must I maintain a serial and parallel versions? • Can I avoid rewriting my application every time a new processor generation increases the core count? • Can I easily multicore-enable ill-structured and irregular code, or is the multicore software platform limited to data-parallel applications? • Does the multicore software platform properly support modern programming paradigms, such as objects, templates, and exceptions? • What does it take to handle global variables in my application?
Application Performance • How can I tell if my application exhibits enough parallelism to exploit multiple processors? • Does the multicore software platform address response-time bottlenecks, or just offer more throughput? • Does application performance scale up linearly as cores are added, or does it quickly reach diminishing returns? • Is my multicore-enabled code just as fast as my original serial code when run on a single processor? • Does the multicore software platform's scheduler load-balance irregular applications efficiently to achieve full utilization? • Will my application "play nicely" with other jobs on the system, or do multiple jobs cause thrashing of resources? • What tools are available for detecting multicore performance bottlenecks?
Software Reliability • How much harder is it to debug my multicore-enabled application than to debug my original application? • Can I use my standard, familiar debugging tools? • Are there effective debugging tools to identify and localize parallel-programming errors, such as data-race bugs? • Must I use a parallel debugger even if I make an ordinary serial programming error? • What changes must I make to my release-engineering processes to ensure that my delivered software is reliable? • Can I use my existing unit tests and regression tests?
Parallel C++ Options Pthreads & WinAPI threads • An API for creating and manipulating O/S threads. • Programmer writes thread-interaction protocols. Intel’s Threading Building Blocks • A C++ template library with automatic scheduling of tasks. • Programmer writes explicit “continuations.” OpenMP • Open-source language extensions to C++. • Programmer inserts pragmas into code. Cilk++ • Faithful extension of C++. • Programmer inserts keywords into code that do not destroy serial semantics. • Provably good scheduler and a race-detection tool.
Cilk++ Cilk++is a remarkably simpleset of extensions for C++ and a powerful runtime systemfor multicore applications. Cilk++provides a smoothevolutionfrom serial programming to parallel programming.
CILK ARTS Solution Application Performance • Best-in-class performance • Linear scaling as cores are added • Minimal overhead on a single-core Development Time • Minimal application changes • Can be learned in days by programmers without multithreading expertise • Seamless path forward (and backward) Software Reliability • Multithreaded version as reliable as the original • No fundamental change to release engineering
Cilk++Compiler Conventional Compiler CILK ARTS Solution 1 int fib (int n) { if (n<2) return (n); else { int x,y; x = cilk_spawn fib(n-1); y = fib(n-2); cilk_sync; return (x+y); } } 2 Cilk++Hyperobject Library 5 Cilk++source Linker int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } 4 Cilk++Race Detector Binary Serial code Cilk++Runtime System 3 Parallel Regression Tests Conventional Regression Tests Reliable Single-Threaded Code Reliable Multi-Threaded Code Exceptional Performance
Thank You! • Free e-Book www.cilk.com/multicore-e-book/ • We are currently accepting applications for our Early Visibility program • For more info about Cilk++ and resources for multicoders: • duncan@cilk.com • www.cilk.com