1 / 39

Finding Bugs in Dynamic Web Applications

Finding Bugs in Dynamic Web Applications. Shay Artzi , Adam Kiezun , Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar , Michael D. Earnst. Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis ). CSE 6329 Special Topics in Advanced Software Engineering.

tacey
Télécharger la présentation

Finding Bugs in Dynamic Web Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Bugs in Dynamic Web Applications Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, AmitParadkar, Michael D. Earnst Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis )

  2. CSE 6329 Special Topics in Advanced Software Engineering • Presented By • Md. Monjurul Hasan

  3. Dynamic Web Application • Generates pages (HTML contents) on-the-fly • Content varies on user and user-specified criteria • Obtained by server-side programming • We can say that all big, known web applications are Dynamic Web Application Source: Dynamic Web Application Development using PHP and MySQL – By Simon Stobart and David Parsons

  4. Web Threats • Web script crashes and malformed dynamically-generated Web pages impact usability of Web applications • Current tools for Web-page validation cannot handlethe dynamically-generated pages

  5. Web Script Crash • Missing included file • Call to undefined method • Wrong Database query • Uncaught exceptions

  6. Malformed HTML • HTML that does not conform to the WDG (Web Design Group) or W3C’s (World Wide Web Consortium) standard • Not using defined tags by W3C (e.g. <html><table><div>..etc.) • Not maintaining the structure(e.g. <html><header></header><body> .. </body></html>) • Not using proper opening and matching closing tag • etc. • Web Scripting language can generate HTML

  7. The Problem • Bad scripts creating syntactically-malformed HTML • Partially displayable or Non-displayable HTML • Browser’s attempt to correct  crashes • Slower HTML rendering • Discard important information • Trouble indexing correct pages for search engines • Example

  8. More Problems • Dynamic web page testing challenges • HTML validation tools only perform testing of static page • Can not fully capture behavior since not all of functionality of code is found in the HTML result • No automatic validator for scripting languages that dynamically generate HTML pages • HTML Kit validates every generated page but requires manual generation of inputs that lead to displaying pages

  9. What this paper presents… • Presents automated technique for finding faults manifested as Web script crashes or malformed-HTML – extends dynamic test generation to scripting languages. • Identifies minimal part of input responsible for triggering failures • Uses an oracle to determine well-formed HTML • Creates a tool, Apollo that implements all these in the context of PHP

  10. Why ? • Widely used in Web development • Network interactions • Database • HTTP processing • Object oriented • Scripting • 21 millions domains1 (75%) are powered including large websites like Wikipedia, WordPress, Facebook, Dig etc. 1Source Netcraft, April 2007

  11. Example: program • SchoolMate.php • Allows school administrators to manage classes and users, teachers to manage assignments and grades and students to access their information • Typical URL: schoolmate.php?page=1&page2=100&login=1&username=user&password=password

  12. 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?>

  13. 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?> ‘printReportCards.php’ missing make_footer() not executed in certain situations  unclosed HTML tag Generates illegal <j2> tag

  14. Failures in PHP programs • Targets two types of failures • Execution failures • Web Script Crashes • HTML failures • Malformed HTML

  15. Failure-Finding in PHP Applications • Concolic Testing – Dynamic Test Generation Technique Execute application on • Initially on empty input • Then on additional inputs, obtained by solving constraints that are derived from control flow paths • Extensions • Validate to correctness of program output by using oracle • Use isset, isempty, require, etc. to require generation of constraints absent in other OOPL’s • Use pre-specified set of values for database authentication • Simulate each user input by transforming source code

  16. Transformation of Code • Interactive HTML pages with buttons and menus • For each page (h) that contains Nbuttons • Add additional input parameterpto PHP program • Values range from 1 to N • Switch statement inserted including appropriate PHP source file, depending on p

  17. An example <?php echo “<h2>Webchess “.$Version.” login”</h2>; ?> <form method = “post” action = “mainmenu.php”> <p> Nick: <input name=“txtNick” type=“text” size=“15” /><br /> Password: <input name=“pwdPassword” type=“password” size =“15” /> </p> <p> <input name=“login” value=“login” type=“submit” /> <input name=“newAccount” value=“New Account” type=“button” onClick =“window.open(‘newuser.php’, ‘_self’)” /> </p> </form> <? /* Simulated User Input */ Switch ($_GET[“_btn”] { Case 1: require_once(“mainmenu.php”); break; Case 2: require_once (“newuser.php”); break; } ?>

  18. The Failure Detection Algorithm • parameters: Program P, oracle O • result : Bug reports B; • B : setOf (<failure, setOf (pathConstraint), setOf (input)>) • P′ ≔ simulateUserInput(P); • B ≔ empty; • pcQueue ≔ emptyQueue(); • enqueue(pcQueue, emptyPathConstraint()); • while not empty(pcQueue) and not timeExpired() do • pathConstraint ≔ dequeue(pcQueue); • input ≔ solve(pathConstraint); • if input not equals to⊥ then • output ≔ executeConcrete(P′, input); • failures ≔ getFailures(O, output); • foreachf in failures do • merge <f , pathConstraint, input>into B; • c1 ∧ . . . ∧ cn ≔ executeSymbolic(P′, input); • foreach i = 1,. . . ,n do • newPC ≔ c1 ∧ . . . ∧ ci−1 ∧ ¬ci; • queue(pcQueue, newPC); • return B;

  19. parameters: Program P, oracle O result : Bug reports B; B : setOf (<failure, setOf (pathConstraint), setOf (input)>) P′ ≔ simulateUserInput(P); B ≔ empty; pcQueue ≔ emptyQueue(); enqueue(pcQueue, emptyPathConstraint()); while not empty(pcQueue) and not timeExpired() do pathConstraint ≔ dequeue(pcQueue); input ≔ solve(pathConstraint); if input not equals to⊥ then output ≔ executeConcrete(P′, input); failures ≔ getFailures(O, output); foreach f in failures do merge <f , pathConstraint, input>into B; c1 ∧ . . . ∧ cn ≔ executeSymbolic(P′, input); foreach i = 1,. . . ,n do newPC ≔ c1 ∧ . . . ∧ ci−1 ∧ ¬ci; queue(pcQueue, newPC); return B; Example: Execution 1 (Expose Third Fault) 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> true – sets page = 0 false • HTML validation tool determines output is legal • NotSet(page) ∧page2 ≠ 1337 ∧ login ≠ 1 NotSet(page)∧page2 ≠ 1337 ∧ login = 1 NotSet(page) ∧page2 = 1337 Set(page) GoTo(20) Execution

  20. Example: Execution 2 (The Opposite Path) 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?> • NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1 • Constraint solver may get page2  0; login  1 HTML validation tool discovers failure and generates bug report  added to output set of bug reports true true

  21. Minimization on Path Constraints • Find shorter path constraint for a given bug report • Eliminates irrelevant constraints – better assist programmer to detect location of the fault • Solution for a shorter path constraint is often a smaller input • Does not guarantee returned path constraint is shortest that exposes failure

  22. Minimization Example • HTML malformation from previous example could have been reached from different execution paths • NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1 page2 ≠ 1337 ∧ login = 1 • Set(page) ∧ page = 0 ∧ page2 ≠ 1337 ∧ login = 1 page2 ≠ 1337 login = 1 (login  1)

  23. Path Constraint Minimization Algorithm • parameters: Program P, oracle O, bug report b • result : Short path constraint that exposes b.failure • c1 ∧ . . . ∧ cn ≔ intersect(b.pathConstraints); • pc ≔ true; • foreach i = 1, . . . , n do • pci ≔ c1 ∧ . . . ci−1 ∧ ci+1 ∧ . . . cn; • input ≔ solve(pci); • if input not equals ⊥ then • output ≔ executeConcrete(P, input); • failures ≔ getFailures(O, output); • if b.failure not belongs to failures then • pc ≔ pc ∧ ci; • input pc ≔ solve(pc); • if input pc not equals to ⊥ then • outputpc ≔ executeConcrete(P, input pc ); • failurespc ≔ getFailures(O, outputpc ); • if b.failure ∈ failurespc then • return pc; • return shortest(b.pathConstraints);

  24. Apollo • User Input Simulator • Executor • Bug Finder • Oracle • Bug Report Repository • Input minimizer • Input Generator • Symbolic Finder • Constraint Solver • Value Generator

  25. Apollo

  26. Executor: Shadow Interpreter • Shadow Interpreter • Modified Zend PHP interpreter 5.2.2 to record path constraints and information associated with output • Performs symbolic execution along with concrete execution • Records conditions for PHP-specific comparison operations such as isset and empty

  27. Executor: Database Manager • Database Manager • (Re) initializes DB used by a PHP application. Restores DB before each execution • Supply additional information about username/password pairs

  28. Bug Finder • Bug Report = Failure + Path constraint + Input inducing failure • Failure = Type of Failure + Corresponding Message + PHP statement generating bad HTML • Oracle – HTML validation tool (WDG and WC3) • Input Minimizer– uses the path constraints minimization algorithm

  29. Input Generator • Symbolic Driver – generates new path constraints and select next path constraint • Constraint Solver – computes an assignment of values to input parameters that satisfies a given path constraint. • Choco constraint solver • Value Generator – generates value for parameters • Combines random value generation and constant values mined from source code

  30. Experimentation faqforge = Tool for creating and managing documents webchess = Online chess game schoolmate = PHP/MySQL solution for administering schools phpsysinfo = Displays system info

  31. Generation Strategies • Compared to two other approaches • Halfond and Orso (Randomized) • Random values to the parameters • Proposed for JavaScript • Minamide’s static analysis • Approximates the string output of program with a context-free grammar • Discovers malformed HTML faults • Apollo’s test input generation previously discussed

  32. Methodology • 10-minute runs on each program • Generation of hundreds of inputs • Ran on both Apollo and Random test input generation strategies • WDG offline HTML validation tool

  33. Results Classification • Execution crash: PHP interpreter terminates with exception • Execution error: PHP interpreter emits warning visible in generated HTML • Execution warning: PHP interpreter emits warning invisible to HTML output • HTML error: program generates HTML for which validation tool produces error report • HTML warning: program generates HTML for which validation produces a warning report

  34. Results Analysis Resulted in Malformed HTML Tries to load two missing files Database related Unset Time-zone Apollo Randomized Average line coverage – 58.0% Faults Found on Subject Apps – 214 Average line coverage – 15.0% Faults Found on Subject Apps – 59 Line Coverage = Number of executed lines / Total lines with executable PHP code in application

  35. Results Analysis • Apollo Vs Randomized • 58% line coverage Vs 15.2% line coverage • 214 faults Vs 59 faults • Apollo Vs Minamide’s tool • 2.7 more HTML validation faults (120 Vs 45) • 83 additional execution faults • 104 faults (10 minutes) Vs 14 faults (126 minutes) • Apollo is more effective and efficient than both

  36. Results Analysis: Path Constraint Minimization Reduces size of inputs by up to factor of 0.18 for more than 50% of faults Success rate – Percentage of faults whose exposing input was minimized Orig. size – Average size of original path constraints (# of conjuncts) and inputs (# of key-value pairs) Reduction columns – Ratio of minimized to un-minimized size. The lower the ratio, the more successful the minimization

  37. Limitations Simulating user inputs statically JavaScript code in the generated HTML not tracked Limited line coverage for native C methods Limited sources of input parameters Only inputs from global arrays (_POST, _GET and _REQUEST)

  38. Thank you

More Related