Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Gary Wassermann Zhendong Su
What is SQL injection attack ? • An attacker exploits faulty application code to execute maliciously crafted database queries. • In 2006, 14% of the reported vulnerabitilities were SQLCIVs, making SQL injection the second most frequently reported security threat.
An example $userid= “1';DROP TABLE unp_user; --” Executed query : SELECT * FROM `unp_user` WHERE userid='1'; DROP TABLE unp_user; --'
Existing Approaches • Tainted information flow tracking • do not model the precise semantics of input sanitization routines • require manually written specifications • not fully automated and may require user intervention (e.g dynamic include in PHP) • String analysis- based techniques • do not track the source of string values and therefore require specifications
Context Free Grammar (CFG) denotes “derives in one step” for example : if denotes “drives in finite number of steps”
The article’s approach • Model string values as CFG • Label nonterminals as “direct” or “indirect” if needed • Checks if all string in the language of the CFG are not SQLCIV according to definition
Illustration of the algorithm For all sentential forms derivable from query GETuid is between quotes in a syntactic position of a string literal
Building the CFG (2) • Not all string operations are concatenation and assignments what about x=escape_quotes(x) ? • We need to model x escape_quotes(y) • In order to model those cases we use Finite State Transducers (FST)
FST • Finite-state machine whose output values are determined both by its current state and by the values of its inputs • Has one or more final states • May be non-deterministic
Example :modeling str_repalce with FST str_replace(“‘‘“, “‘“, $B)
The Problem with FST • Cannot model all string functions in PHP • Preg_replace(pattern , replacement, subject ) • Mohri and Sproat describe how approximate those functions using two FST
Policy Conformance Analysis (1) • If an untrusted substring has and odd number of quotes it cannot be syntactically confined. For each labeld X if Then X is not safe
Policy Conformance Analysis (2) • If labeld X only occur in the syntactic position of string literals : • If any form that derives from X has unescaped quotes in it then X derives unconfined strings and X is not safe • Else X is safe
Policy Conformance Analysis (3) • If X only derives numeric literals Then X is safe
Policy Conformance Analysis (4) • If X can produce a non numeric string outside of quotes it likely represents an SQLCIV . To confirm this we check whether X derive any string that cannot be confined (e.g. “drop where,”“-- “ ). If it can then X is unsafe
Policy Conformance Analysis (5) • If each string, derives from the remaining labeld nonterminals, is derivable from some nonterminal in the SQL grammar then the remaining labeld nonterminals are safe.
Implementation • Using modified Minamide’s String analyzer • Specifications for 243 PHP functions were added • Improvement in PHP dynamic includes support • Check derivability using an extension of Earley’s parsing algorithm
Results False positive rate = 20.8% False negative rate = 0%
explanations for false positive rate • Insufficient precision through type conversions • ASCII functions
Future improvements • Improve analyzing of helper functions in other files • Analyzing only strings which affect the data base
Conclusions • Catch all SQLCIV • Could be very slow (but future improvements will make it faster ) • False positive rate a bit high but will be improved in next version