230 likes | 298 Vues
Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience Brian.Pane@cnet.com O’Reilly Open Source Convention, San Diego, CA July 24, 2002. Agenda. Introductions Performance optimization approach
E N D
Performance Optimization in Apache 2.0 Development:How we made Apache faster, and what we learned from the experienceBrian.Pane@cnet.comO’Reilly Open Source Convention, San Diego, CA July 24, 2002
Agenda • Introductions • Performance optimization approach • Specific optimizations in Apache 2.0 • General strategy for open-source software performance improvement • Results and Next Steps
Goals for Apache 2.0 Performance • Make the httpd faster • But what does that mean? • How will we measure speed? • What are we willing to sacrifice for speed? • And why does performance matter?
Optimization Strategy: Part 1 Know your project’s priorities: • Metrics that matter • Rules of the game
Performance Guidelines • Metrics that matter for Apache: • Throughput • HTTP requests per second • Resource utilization • CPU, memory • Rules of the game for Apache: • Keep the server portable, reliable, configurable, maintainable, and compatible
Making Strategic Tradeoffs • Use these metrics and rules to make effective tradeoffs • Example: Table data structures • Slow, O(n)-time lookups; a significant bottleneck • But 3rd party code depended upon the array-based implementation (wasn’t well abstracted) • Solution: keep the O(n) design, but optimize it heavily (improve the throughput metric, but maintain compatibility)
Optimization Strategy: Part 2 Profile early, profile often
Profiling Tools • We used traditional code profiling tools to find the slow functions and basic blocks • gprof • Quantify • OProfile • Plus tracing tools to profile system calls • truss • strace • And occasional custom instrumentation
Profile-Driven Optimization • Profiling helps to create an informal roadmap: • Small problems: fix the code now • Medium problems: phase in API changes & faster code • Large problems: rearchitect
AcceptConnection ReadRequest Create RequestData Structures Map URLto File DetermineContent-Type LogRequest Send ResponseTo Client Stream OutputThrough Filters OpenFile Profile-Driven Optimization Apache 2.0 optimizations due to profiling, throughout the entire request processing flow: Faster accept(2)serialization Less buffercopying More scalable, multi-threaded memory allocator Less stringmanipulation Faster MIME-typemapper and configmerge Timestamp cachingin access logger Platform-specificsocket I/O speedups Complete rewrite ofserver-side-includeparser
Optimization Strategy: Part 3 Take advantage ofimprovements in the platform
Platform Optimizations • 2.0 uses fast platform features if available: • sendfile(2) • unserialized or pthread-mutex-serialized accept(2) • Atomic operations
Platform Optimizations • Apache Portable Runtime (APR) library abstracts the OS specifics • “Greatest common denominator” approach • Write your application code to use efficient OS features • On platforms where those features are not available, APR will emulate them • In 2.0, the concurrency model is a plug-in • We can add better threading models for specific platforms
Optimization Strategy, Part 4 Use the powerof distributed development
Distributed Development • Just like open source debugging, open-source performance tuning scales well as more people work on a problem • “Redundant” coding has worked well: • Multiple people implementing different approaches to the same problem • Share ideas, compare results, pick the best implementation
Distributed Optimization Example:SSI Parser From: Brian Pane Date: 2001-09-05 3:00:35Subject: remaining CPU bottlenecks in 2.0… Here are the top 30 functions, ranked according to their CPU utilization. : CPU time function (% of total) -------- ------------ find_start_sequence 23.9 … * find_start_sequence() is the main scanning function within mod_include. …
Distributed Optimization Example:SSI Parser From: Justin Erenkrantz Date: 2001-09-05 8:42:46Subject: [PATCH] Potential replacement for find_start_sequence…Basically, replace the inner search with a Rabin-Karp search… From: Sander Striker Date: 2001-09-05 8:47:59Subject: Re: [PATCH] Potential replacement for find_start_sequence…Rabin-Karp introduces a lot of * and %. I'll try Boyer-Moore with precalced tables for '<!--#' and '--->'… From: Sascha Schumann Date: 2001-09-05 10:51:53Subject: Re: [PATCH] Potential replacement for find_start_sequence…I'd suggest looking at BNDM which combines the advantages of bit-parallelism (shift-and/-or algorithms) and suffix automata… From: Ian Holsman Date: 2001-09-05 16:18:11Subject: [PATCH] Potential replacement for find_start_sequence..--skip5 …I can post my code to the skip5 implementation. It isn't optimized yet, but in my tests I see a lower CPU utilization than the standard mod-includes parser…
Distributed Optimization Example:SSI Parser From: Justin Erenkrantz Date: 2001-09-05 19:08:31Subject: [PATCH] Round 2 of mod_include/find_start_sequence...…Replaced Rabin-Karp with the bndm algorithm as implemented by Sascha. Seems to work. Can people please test/review?… • SSI parser performance improvement: • Before: 23.9% of total usr CPU time • After: 4.8% • Greater than 4x improvement in 48 hours
Results Performance on a simple file delivery test: Test case description: • Server running on Solaris 8 on Sun E4000/8x167 MHz, 2GB RAM • 20 concurrent client connections requesting 10KB non-parsed file over 100Mb/s switched network
Results Performance on a server-parsed (.shtml) file test: Test case description: • Server running on Solaris 8 on Sun E4000/8x167 MHz, 2GB RAM • 20 concurrent client connections over 100Mb/s switched network • .shtml file with virtual includes of five 2KB files
Conclusion Next steps for Apache: • Continue incremental performance improvements • Explore highly scalable concurrency models (multiple connections per thread)
Conclusion Recommendations for other projects: • Know your project’s priorities: • Metrics that matter • Rules of the game • Profile early, profile often • Take advantage of platform improvements • Use the power of distributed development