CSC 2720 Building Web Applications

CSC 2720Building Web Applications Improving Web Application Performance

Outline • Introduction • Locating the bottleneck • Improvement Methods

Introduction • To improve the performance of a web application could mean • To reduce latency • i.e., to reduce the time delay between sending a request and receiving the corresponding response, or between sending a request and receiving all the page components, such as images, that are needed to render the page. • To serve as many concurrent requests as possible without failing or exceeding a response time limit • Response time – the time a server takes to serve a request

PHP Performance Optimization • Obtaining good performance is not merely writing fast PHP scripts. High performance PHP requires a good understanding of the underlying hardware, the operating system and supporting software such as the web server and database. • Source: http://phplens.com/lens/php-book/optimizing-debugging-php.php • Often involves trade-off among CPU, storage, bandwidth and other resource requirements

Factors that affect the performance • Running out of resources • Processors • Memory • Storage • Network bandwidth • # of the maximum database connections • Poorly designed database schema and queries • Poorly written PHP code • Too much disk access

Locating the Bottlenecks – Profiling • To measure the behavior of a server-side script as it executes, particularly frequency and duration of function calls (Ref: Wikipedia) • Help you detect which parts of your code are worth your attention (e.g., functions that are called the most, functions that take a long time to run) • Can be used to analyze the performance of a database indirectly by measuring the functions that interact with the database (e.g., The mysqli_* functions) • e.g. of Profiling Tools • XDebug: http://www.xdebug.org/

Locating the Bottlenecks – Stress Tests • Tests a web application for its robustness, availability, and error handling capabilities under a heavy load, particularly to ensure the software doesn't crash in conditions of insufficient computational resources (such as memory or disk space), unusually high concurrency, or denial of service attacks. (Ref: Wikipedia) • Use in conjunction with other tools to find out • The maximum # of requests/users a server can handle before failing or slowing down significantly • The bottleneck (i.e., which resource run out first?) • The average time a page and its components to fully load.

Locating the Bottlenecks – Stress Tests • Microsoft Web Application Stress Tool • Can emulate HTTP requests (with parameters) from clients • Can generate large number of requests within a short period of time (stress test) • Can help you gather info such as the average and the highest latency experienced from accessing a URL

Locating the Bottlenecks – System Info • Show the usage and availability of various system resources (e.g., CPU, memory, virtual memory, I/O). The usage can be the overall usage or usage by individual processes. • Use in conjunction with stress tests tools to locate the bottleneck of a server • e.g. of tools • "Task Manager" on Windows OS • The command "tops" on most Unix/Linux OS

Improvement Methods Small changes that could make a big difference. • PHP accelerators (Opcode compiler + Opcode cache) • Content caching • Utilizing client's and proxy's cache • Cache output • Server-Side web proxy • Compression • Connection pooling

Improvement Methods Fine tuning your web applications, servers, and OS • Reducing number of HTTP requests • Query optimization • Optimizing PHP code • Additional methods that make a page loads faster • Tuning the Server (Apache)

1. PHP Accelerators • Typically made up of an opcode compiler and an opcode cache • Opcode compiler – compiles PHP code into opcode • Opcode cache – keeps frequently used compiled PHP scripts in memory • A PHP accelerator can help reducing the response time of PHP scripts significantly because • Interpreting opcode is faster than interpreting PHP code • Loading opcode from memory is faster than loading PHP scripts from disk • List of PHP accelerators: • http://en.wikipedia.org/wiki/PHP_accelerator

1.1 How PHP Accelerators Work Source: http://phplens.com/lens/php-book/optimizing-debugging-php.php

2.1 Content Caching – Utilizing Client and Proxy Caches • Request clients/proxies to cache reusable components (e.g., images, scripts and stylesheets) in order to avoid retransmitting the same components • To illustrate, suppose • Clients A and B share a proxy server. • The HTML page generated by index.php needs x.jpg and y.css. • Only x.jpg and y.css are cacheable. Client A Proxy Server index.php x.jpg Client B y.css

Client A Proxy Server index.php x.jpg Client B y.css The 1st time client A requests index.php from the server, all three files need to be transferred from the server. Client A Proxy Server index.php x.jpg Client B y.css After the request, x.jpg and y.css are chached in the client A's and proxy's cache.

Client A Proxy Server index.php x.jpg Client B y.css In subsequent requests, client A only needs to download the HTML page generated by index.php. Client A Proxy Server index.php x.jpg Client B y.css If client B accesses index.php after client A has accessed the same file, then client B could load the page faster because the proxy server only needs to retrieve the HTML content from the server. In practice, there could be more than one proxy servers between the clients and the server.

2.1 Content Caching – Utilizing Client and Proxy Caches • Use the Expires or Cache-Control header fields to tell the clients and proxies how a component should be cached e.g., • When should the component be considered as expired? • Is the component cacheable? • Recommendations • For static components: implement a "Never expire" policy by setting far future Expires header. • For dynamic components: use an appropriate Cache-Control header to help clients with conditional requests.

2.1 Content Caching – Utilizing Client and Proxy Caches • For examples • To indicate that a component expires on a fixed time and date Expires: Sat, 11 Apr 2009 20:00:00 GMT • To indicate that a component expires in one hour (relative to the access time) and the client must revalidate the content with the server when the component becomes stale Cache-Control: max-age=3600, must-revalidate • To set caching policy for static components, you can configure the web server. • To set up expiration policy for different files with Apache, see module mod_expires or these examples. • To set up default header values for different files with Apache, see module mod_headers or these examples.

2.1 Content Caching – Utilizing Client and Proxy Caches • References and Reading Materials • Caching Tutorial • Contains specific info about how (and how not) to cache • http://www.mnot.net/cache_docs/ • Working with cached pages in PHP • http://www.badpenguin.org/docs/php-cache.html • HTTP conditional requests in PHP • http://alexandre.alapetite.net/doc-alex/php-http-304/index.en.html • Use Server Cache Control to Improve Performance • http://www.websiteoptimization.com/speed/tweak/cache/

2.2. Content Caching – Reusing Generated Output • If a script only generates new content periodically, cache the generated output to avoid executing code and querying database for every request. • Examples of cacheable output • List of high scores for an online game • List of products on an e-commerce website in which the products are updated daily • PHP examples about output caching • Caching output in PHP • http://www.addedbytes.com/php/caching-output-in-php/ • Output Caching with PHP • http://www.devshed.com/c/a/PHP/Output-Caching-with-PHP/

3. Server-Side Web Proxy • Use a web proxy at the server side to relieve the web server from serving frequently requested static files • An example of web proxy: Squid • http://www.squid-cache.org/

4. Compression • Reduce the data size before transmitting • (Online) Use HTTP Compression – Compress textual data on the fly before sending them to a client • Can typically reduce the size of textual data by 70% • (Offline) Use compression tools to reduce the file size of JavaScript, CSS, Images, Video, etc. • The compression tools must not change the file format or the content of these files. Otherwise the files cannot be referred from HTML files. • e.g., use optipng for PNGs, gifsicle for GIFs and jpegoptim for JPGs

4.1. Compression – HTTP Compression • A publicly defined way to compress textual content transferred from web servers to browsers • Compression is done at the server. • Built into HTTP 1.1 and is supported by most browsers • Drawback: Takes time and CPU cycles to compress • Ref: http://www.websiteoptimization.com/speed/tweak/compress/ • Using HTTP Compression in PHP • Configure php.ini to enable automatic HTTP compression • zlib runtime configuration (http://hk2.php.net/manual/en/zlib.configuration.php) • Perform HTTP compression in PHP scripts programmatically • Examples of using ob_gzhandler (http://hk2.php.net/ob_gzhandler)

5. Connection Pooling (Why?) • A database connection incurs overhead – it requires resources to create the connection, authenticate it, maintain it, and then release it when it is no longer required. • The overhead is particularly high for Web-based applications. • A server-side script typically opens a connection, performs few queries, and then close the connection. • Often, more effort is spent connecting and disconnecting than is spent during the interactions themselves. • Ref: IBM WebSphere App. Server – What is Connection Pooling?

5. Connection Pooling • A connection pool is a cache of opened database connections. • When a script needs to establish a connection to the database, a connection is selected from the pool if one is available. Otherwise a new connection is created. • When a script closes the connection, the connection is not actually closed but returned to the pool so that the connection can be reused by other scripts. • Note: Implementing a connection pool is not easy. Usually we just use it if it is available.

5.1. Connection Pooling in PHP • PHP's Persistent Connection • Use mysql_pconnect() to open a persistent database connection • There is no equivalence in MySQLImproved (mysqli) extension. • Must be used with care because changes made to the database states, such as setting autocommit to "off", will affect the next script that uses the connection. • Other connection pooling solutions: • SQL Relay: http://sqlrelay.sourceforge.net/index.html • Apache Module mod_dbd: http://httpd.apache.org/docs/2.1/mod/mod_dbd.html

6. Reducing # of HTTP Requests (Why?) • A large portion of the total response time to create a fully rendered page is spent on downloading the page components like images, stylesheets, JavaScript, etc. • Some browsers only allow at most two concurrent requests per server. That means the page components have to take turn to load. • Ref: http://developer.yahoo.com/performance/rules.html

6. Reducing # of HTTP Requests • Reducing # of components  reducing # of HTTP requests  Page loads faster • Methods to reduce the # of page components • Combine multiple stylesheets into one • Combine multiple scripts into one • CSS Sprites – Tile multiple images into one image and then make use CSS to clip the needed image from the combined image.

7. Query Optimization • Tune DB Schema • First three normal forms  help ensure data integrity • Denormalization – a process that attempts to optimize the performance of a database by adding redundant data or by grouping data (but makes maintaining data integrity difficult). • Query only what you really need • e.g., instead of using "SELECT *", select only the columns you need and use LIMIT to limit the number of rows retrieved from the DB • Make use of indexes to improve the performance of data retrieval • Take a database course …  • Ref: How to Optimize Queries (Theory an Practice) • http://www.serverwatch.com/tutorials/article.php/2175621

8. Optimizing PHP Code • Make use of output buffer • See PHP Output Bufering Control: http://hk2.php.net/ob_start <?php ob_start(); // Start output buffering // All the output are kept in memory instead // of sending to the client. ?> <html> <head><title>Foo</title></head> <body> <?php echo "Blah Blah Blah"; ?> </body></html> <?php ob_end_flush(); // Flush everything in the output buffer // to the client at once. ?>

8. Optimizing PHP Code echo $str1, " ", $str2, " ", $str3; executes faster than echo $str1 . " " . $str2 . " " . $str3; • Note: This only works with echo, which is a function that can take several strings as arguments. • If you have CPU-intensive tasks to perform, consider implement them as C extensions. • Use the predefined functions (as oppose to writing your own functions) whenever possible

8. Optimizing PHP Code • Instead of writing for ($i=0; $i < count($array); $i++) … Use a variable to store the array size and rewrite the loop as $array_size = count($array); for ($i=0; $i < $array_size; $i++) … • More tips about optimizing PHP code can be found at http://reinholdweber.com/?p=3

9. Additional Methods that Make a Page Loads Faster • Post-load Components • Load the less important components on the background • Preload Components • Anticipating what components are needed in the future and pre-load them (i.e., utilizing browser's idle time) • Split Components Across Domains • Maximize parallel downloads (a browser may only issue a few HTTP request in parallel to the same sever) • Make sure you're using not more than 2-4 domains because of the DNS lookup penalty. • Ref: Best Practices for Speeding Up Your Web Site (http://developer.yahoo.com/performance/rules.html)

10. Tuning the Server (Apache) • SendBufferSize – Size of output buffer • MaxClients – Maximum # of clients • StartServers – The number of child processes to create at start up • MinSpareServers, MaxSpareServers – The number of idle child processes to keep alive • Keep-alive – tells the server to reuse the same socket connection for multiple HTTP requests to reduce the overhead of frequent connects • Source: http://phplens.com/lens/php-book/optimizing-debugging-php.php

Scaling • To improve the performance by introducing more machines (to host more servers) • Server Clusters • Database Replication • Improve performance or availability of the whole database system • MySQL Replication (http://dev.mysql.com/doc/refman/5.0/en/replication.html)

References • Best Practices for Speeding Up Your Web Site • http://developer.yahoo.com/performance/rules.html • Performance Research, Part 1: What the 80/20 Rule Tells Us about Reducing HTTP Requests • http://yuiblog.com/blog/2006/11/28/performance-research-part-1/ • Performance Research, Part 2: Browser Cache Usage - Exposed! • http://yuiblog.com/blog/2007/01/04/performance-research-part-2/ • Practical PHP Performance • http://www.developertutorials.com/tutorials/php/practical-php-performance-8-02-07/page3.html

CSC 2720 Building Web Applications