Static Analysis of String Values
Strings play a crucial role in programming, especially in SQL queries and data manipulation. This document discusses the potential risks of improper string handling, emphasizing the importance of static analysis to prove properties at compile time. It explores methods of abstract interpretation, introducing concepts such as character inclusion, prefixes, suffixes, and string graphs. The framework aims for fast and precise abstractions while ensuring soundness in approximating string semantics, vital for preventing catastrophic runtime errors.
Static Analysis of String Values
E N D
Presentation Transcript
Strings • Strings are everywhere: • SQL queries • Reflection • Wrong use could have catastrophic effects
Sound static analysis • Prove properties • at compile time (static) • respected by all executions (sound) • Abstract interpretation • Cousot&Cousot 77/79 • Mathematical framework to • Define the semantics • Soundly approximate it • Ideal goal: fast and precise abstraction
Bases of abstract interpretation Concrete Abstract {…, -1, 0, 1, ….} ⊤ {1, 2, ….} Concretization + 0 - Abstraction {1, 5, 8} ⊥ ∅
Semantics Concrete Abstract {…, -1, 0, 1, ….} ⊤ {1, 2, ….} 0 - + x++ {1, 5, 8} {2, 6, 9} x++ ⊥ ∅
Upper bound Concrete Abstract if(…) x=0; else x=1; {…, -1, 0, 1, ….} ⊤ {0, 1} 0 - + {0} {1} ⊥ ∅
Numerical analyses • Common interface for several analyses • Semantics of +, -, *, /, constants, … ⊤ ⊤ x++ 0 - Even + Odd x++ [0..3] [1..4] x++ ⊥ ⊥
Outline • Introduction • Generic framework for string analysis • String domains • Character inclusion • Prefix and suffix • Bricks • String graphs • Conclusion
String operators • Set of standard operators on strings: • new String(“str”) • or “str” • concat(s1, s2) • or s1+s2 • readLine() • substring(b, e, s) • contains(c, s) • Each domain has a lattice structure
Running example Because of approximation/user input/… string x = "a"; while(…) x = "0" + x + "1"; return x; with with
Outline • Introduction • Generic framework for string analysis • String domains • Character inclusion • Prefix and suffix • Bricks • String graphs • Conclusion
Character inclusion • Strings approximated through • C: characters surely contained • MC: characters possibly contained Concrete Abstract C MC
Character inclusion – Running example string x = "a"; while(…) x = "0" + x + "1"; return x; C : MC : C : MC : C : MC : Concretization
Prefix & Suffix • Strings approximated through • PR: prefix of the string • SU: suffix of the string Concrete Abstract PR SU
Prefix & Suffix – Running example string x = "a"; while(…) x = "0" + x + "1"; return x; PR : , SU : PR : , SU : PR : SU : Concretization
Bricks • Sequence of Concrete Abstract
Bricks – Running example string x = "a"; while(…) x = "0" + x + "1"; return x; ) = = Widening! = = = Concretization
String graphs • Adaptation of type graphs (tree automata) • Rely on their normalization and widening Concrete Abstract
String graphs – Running example string x = "a"; while(…) x = "0" + x + "1"; return x; = =
String graphs – Running example string x = "a"; while(…) x = "0" + x + "1"; return x; =
String graphs – Running example string x = "a"; while(…) x = "0" + x + "1"; return x; Normalization
String graphs – Running example string x = "a"; while(…) x = "0" + x + "1"; return x; Widening! with Concretization
Outline • Introduction • Generic framework for string analysis • String domains • Character inclusion • Prefix and suffix • Bricks • String graphs • Conclusion