570 likes | 710 Vues
Precise Program Analysis with Data Structures. Bor-Yuh Evan Chang University of California, Berkeley February-April 2008. Collaborators: George Necula , Xavier Rival (INRIA). Precise Program Analysis with Data Structures by Designing with the User in Mind. Bor-Yuh Evan Chang
E N D
Precise Program Analysis with Data Structures Bor-Yuh Evan Chang University of California, Berkeley February-April 2008 Collaborators: George Necula, Xavier Rival (INRIA)
Precise Program Analysis with Data Structuresby Designing with the User in Mind Bor-Yuh Evan Chang University of California, Berkeley February-April2008 Collaborators: George Necula, Xavier Rival (INRIA)
Software errors cost a lot ~$60 billion annually (~0.5% of US GDP) • 2002 National Institute of Standards and Technology report > total annual revenue of > 10x annual budget of Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
But there’s hope in program analysis Microsoft uses and distributes the Static Driver Verifier Airbus applies the Astrée Static Analyzer Companies, such as Coverity and Fortify, market static source code analysis tools Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Because program analysis caneliminate entire classes of bugs For example, • Reading from a closed file: • Reacquiring a locked lock: How? • Systematically examine the program • Simulate running program on “all inputs” • “Automated code review” acquire( ); read( ); Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Program analysis by example:Checking for double acquires Simulate running program on “all inputs” …code … // x now points to an unlocked lock acquire(x); • … code … • acquire(x); • … code … analysis state x Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Program analysis by example:Checking for double acquires Simulate running program on “all inputs” …code … • // x now points to an unlocked lockin a linked list acquire(x); • … code … ideal analysis state x x x … or or or undecidability Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Must abstract Abstraction too coarse or not precise enough (e.g., lost x is always unlocked) …code … • // x now points to an unlocked lockin a linked list acquire(x); • … code … ? ideal analysis state analysis state x x x … or or or x For decidability, must abstract—“model all inputs” (e.g., merge objects) mislabels good code as buggy Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
To address the precision challenge Traditional program analysis mentality: “ Why can’t developers write more specifications for our analysis? Then, we could verify so much more.” “ Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.” My approach: “ Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?” Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary of overview Challenge in analysis: Finding a good abstraction precise enough but not more than necessary Powerful, generic abstractions expensive, hard to use and understand Built-in, default abstractions often not precise enough (e.g., data structures) My approach: Must involve the user in abstraction without expecting the user to be a program analysis expert Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Overview of contributions Extensible Inductive Shape Analysis [POPL’08,SAS’07] Precise inference of data structure properties Able to check, for instance, the locking example Targeted to software developers Uses data structure checking code for guidance • Turns testing code into a specification for static analysis Efficient ~10-100x speed-up over generic approaches • Builds abstraction out of developer-supplied checking code Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Part 1 Extensible InductiveShape Analysis Developer-oriented approach Precise inference of data structure properties … [POPL’08, SAS’07]
Shape analysis is a fundamental analysis Data structures are at the core of • Traditional languages (C, C++, Java) • Emerging web scripting languages Improves verifiers that try to • Eliminate resource usage bugs (locks, file handles) • Eliminate memory errors (leaks, dangling pointers) • Eliminate concurrency errors (data races) • Validate developer assertions Enables program transformations • Compile-time garbage collection • Data structure refactorings … Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
// l is a sorted doubly-linked list for each nodecurinlistl { removecurif duplicate; } assert l is sorted, doubly-linked with no duplicates; Shape analysis by example:Removing duplicates Example/Testing Code Review/Static Analysis program-specific l intermediate state more complicated “segment with no duplicates” “sorted dl list” “sorted dl list” “no duplicates” l l cur cur 2 2 2 4 4 4 2 4 4 l l l Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Shape analysis is not yet practical Choosing the heap abstraction difficult for precision Traditional approaches: Parametric in low-level, analyzer-oriented predicates + Very general and expressive - Hard for non-expert TVLA [Sagiv et al.] 89 • Built-in high-level predicates • - Hard to extend • + No additional user effort (if precise enough) Space Invader [Distefano et al.] My approach: Parametric in high-level, developer-oriented predicates + Extensible +Targetedtodevelopers Xisa Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Key insightfor being developer-friendly and efficient Utilize “run-time checking code” as specification for static analysis. Contribution: Build the abstraction for analysis out of developer-specified checking code • dll(h, p) = • if(h =null) then • true • else • h!prev= p anddll(h!next, h) • assert(sorted_dll(l,…)); for each nodecurinlistl { removecurif duplicate; } • assert(sorted_dll_nodup(l,…)); l l Contribution: Automatically generalize checkers for complicated intermediate states checker l • p specifies where prev should point cur Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Our framework is … An automated shape analysis with a precise memory abstraction based around invariant checkers. • Extensible and targeted for developers • Parametric in developer-supplied checkers • Precise yet compact abstraction for efficiency • Data structure-specific based on properties of interest to the developer • dll(h, p) = • if (h =null) then • true • else • h!prev=prevand • dll(h!next, h) checkers shape analyzer Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Shape analysis is an abstract interpretation on abstract memory descriptions with … • Splitting of summaries(materialization) • To reflect updates precisely (strong updates) • Andsummarizingfor termination (widening) l l l l l cur cur cur cur cur cur l Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Outline Learn information about the checker to use it as an abstraction shape analyzer abstract interpretation 1 splitting and interpreting update 2 type “pre-analysis” on checker definitions • dll(h, p) = • if (h =null) then • true • else • h!prev=prevand • dll(h!next, h) 3 Compare and contrast manual code review and our automated shape analysis summarizing checkers Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Overview: Split summariesto interpret updates precisely Want abstract update to be “exact”, that is, to update one “concrete memory cell”. The example at a high-level: iterate using cur changing the doubly-linked list from purple to red. Challenge: How does the analysis “split” summaries and know where to “split”? split at cur l update cur purple to red cur cur cur cur l l l Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
“Split forward”by unfolding inductive definition dll(cur, p) l p l null get: cur!next Analysis doesn’t forget the empty case Ç • dll(h, p) = • if(h =null) then • true • else • h!prev= p anddll(h!next, h) cur cur cur dll(n, cur) l p n Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
“Split backward” also possible and necessary cur!prev!next = cur!next; “dll segment” dll(n, cur) l p n for each nodecur in list l{ removecurif duplicate; } assert lis sorted, doubly-linked with no duplicates; • Technical Details: • How does the analysis do this unfolding? • Whyis this unfolding allowed? • (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding? get: cur!prev!next Ç dll(n, cur) l n • dll(h, p) = • if (h =null) then • true • else • h!prev= p anddll(h!next, h) null “dll segment” dll(n, cur) l p0 n cur cur cur Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Outline Derives additional information to guide unfolding How do we decide where to unfold? shape analyzer abstract interpretation 1 splitting and interpreting update 2 type “pre-analysis” on checker definitions • dll(h, p) = • if (h =null) then • true • else • h!prev=prevand • dll(h!next, h) 3 summarizing checkers Contribution: Turns testing code into specification for static analysis Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Abstract memory as graphs Make endpoints and segments explicit, yet high-level ° “dll segment” dll(±, °) l ® ¯ ± l ® memory address (value) memory cell (points-to: °!next =±) checker summary (inductive pred) Some number of memory cells (thin edges) cur segment summary ° ± • dll(h, p) = • if (h =null) then • true • else • h!prev= p anddll(h!next, h) next next dll(null) dll(¯) dll(°) prev Which summary (thick edge), in what direction, and how far do we unfold to get the edge ¯!next (cur!prev!next)? ¯ cur Contribution:Generalization of checker (Intuitively, dll(®,null) up to dll(°,¯).) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Types for deciding where to unfold Summary If it exists, where is: °!next ? ¯!next ? dll(¯) ° ® 0 dll(null) dll(¯) -1 Instance next next next prev next prev prev prev Checker “Run” (call tree/derivation) Checker Definition • dll(h, p) = • if(h =null) then • true • else • h!prev= p anddll(h!next, h) dll(®,null) h : {nexth0i, prevh0i } p : {nexth-1i, prevh-1i } -2 dll(¯,®) -1 Says: For h!next/h!prev, unfold from h For p!next/p!prev, unfold before h dll(°,¯) 0 dll(±,°) ® ¯ ° ± null null 1 dll(null,±) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Types make the analysis robust with respect to how checkers are written h : {nexth0i, prevh0i } p : {nexth-1i, prevh-1i } Doubly-linked list checker (as before) • dll(h, p) = • if(h =null) then • true • else • h!prev= p anddll(h!next, h) dll(¯) ° ¯ dll(®) dll(¯) Summary Checker “Run” next next Instance prev ® ¯ ° null prev Alternative doubly-linked list checker • dll0(h) = • if(h!next=null)then • true • else • h!next!prev= h anddll0(h!next) h : {nexth0i, prevh-1i } ¯.dll(®) ° ¯ -1 ¯.dll0() dll0 dll0 dll0 °!prev ? -1 -1 Summary Checker “Run” Different types for different unfolding °.dll(¯) 0 °.dll0() 0 next next null.dll(°) Instance ¯ ° null prev Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary of checker parameter types Tell where to unfold for which fields Make analysis robust with respect to how checkers are written Learn where in summaries unfolding won’t help • Can be inferred automatically with a fixed-point computation on the checker definitions Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary of interpreting updates Splitting of summaries needed for precision Unfolding checkers is a natural way to do splitting When checker traversal matches code traversal Checker parameter types Enable, for example, “back pointer” traversal without blindly guessing where to unfold Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Outline shape analyzer abstract interpretation 1 splitting and interpreting update 2 type “pre-analysis” on checker definitions • dll(h, p) = • if (h =null) then • true • else • h!prev=prevand • dll(h!next, h) 3 summarizing checkers Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
next list l, last cur next next list l last cur next next next list l last cur Summarizeby folding into inductive predicates last = l; cur = l!next; while (cur != null) { // … cur, last … if (…) last = cur; cur = cur! next; } Previous approaches guess where to fold for each graph. Challenge: Precision (e.g., last, cur separated by at least one step) summarize Contribution: Determine where by comparing graphs across history list next next list list list l last last cur cur Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary:Given checkers, everything is automatic shape analyzer abstract interpretation splitting and interpreting update type “pre-analysis” on checker definitions • dll(h, p) = • if (h =null) then • true • else • h!prev=prevand • dll(h!next, h) summarizing checkers Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Results: Performance Times negligible for data structure operations (often in sec or 1/10 sec) Expressiveness: Different data structures TVLA: 290 ms Space Invader only analyzes lists (built-in) TVLA: 850 ms Verified shape invariant as given by the checker is preserved across the operation. Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Demo: Doubly-linked list reversal Body of loop over the elements: Swaps the next and prev fields of curr. Already reversed segment Node whose next and prev fields were swapped Not yet reversed list http://xisa.cs.berkeley.edu Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Experience with the tool Checkers are easy to write and try out • Enlightening (e.g., red-black tree checker in 6 lines) • Harder to “reverse engineer” for someone else’s code • Default checkers based on types useful Future expressiveness and usability improvements • Pointer arithmetic and arrays • More generic checkers: polymorphic “element kind unspecified” higher-order parameterized by other predicates Future evaluation: user study Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Short-term future work:Exploiting common specification framework Scenario: Code instrumented with lots of checker calls (perhaps automatically with object invariants) assert( mychecker(x) ); // … operation on x … assert( mychecker(x) ); • Very slow to execute • Hard to prove statically (in general) • Can we prove parts statically? Static Analysis View: Hybrid checking Testing View:Incrementalize invariant checking Example: Insert in a sorted list v u w Preservation of sortedness shown statically l Emit run-time check for new element: u·v·w Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary ofExtensible Inductive Shape Analysis Key Insight: Checkers as specifications Developer View: Global, Expressed in a familiar style Analysis View: Capture developer intent, Not arbitrary inductive definitions Constructing the program analysis Intermediate states: Generalized segment predicates Splitting: Checker parameter types with levels Summarizing: History-guided approach ® ¯ c(°) c0(°0) h : {nexth0i, prevh0i} p : {nexth-1i, prevh-1i} r list next list list list list Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Are there other kinds of program analysis users? Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Two kinds of users of program analysis software software developer end-user Wants: Precise program analysis for development tools Wants: Program analysis to certify software is ok 1 2 Extensible Inductive Shape Analysis [POPL’08, SAS’07] Analysis of Low-Level Code Using Decompilers [SAS’06, TLDI’05] Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Part 2 Analysis ofLow-Level Code Using Cooperating Decompilers [SAS’06, TLDI’05]
End-users want low-level code analysis • Want analyses to check code to be executed is ok • E.g., won’t crash, good wrt Static Driver Verifier • Do not know any details about the program • Analysis must be fully automatic • But can demand additional information from the developer • To make analysis automatic source code analyzer • But most program analyses operate at the source-level! for low-level code executable code end-user Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Analyzers for low-level code are more difficult and tedious to build Porting source-level analyses is error prone • one statement becomes many instructions • dependencies between instructions must be carefully tracked Key Insight: Low-level complexity • deals with compilation idioms • mostly independent of the analysis • can be captured with intermediate languages source code for low-level code analyzer for low-level code analyzer executable code end-user Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Decompile code rather than port analysis Framework of small, cooperating decompilersthat gradually lift the level of the program Decompilationfor program analysis • Need not get to original, nor be human understandable • Only concern is safety, not performance • Unlike, e.g., Java VM platform (JIT compiler) • Can use additional meta-data (e.g., source-level types) source code for low-level code analyzer executable code end-user Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary of results Flexibility and usability • 3 compilers (gcc/C, gcj/Java, coolc/Cool) • 2 architectures (x86, MIPS) • With 6 decompiler modules • Basic Java type-checking for gcj output implemented in 3-4 hours, 500 lines of code Benefits of modularity • decompiler-based re-implementation of a low-level analysis uncovered 8 bugs in the original implementation (heavily used, deployed in the classroom) Applicability of existing source-level tools • applied C code tools, BLAST and Cqual, on decompiled benchmarks (size: ~10,000 lines of C) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Long-term and outreach Theme: Overcome decidability issues in program analysis by tailoring it to the user • “Programs” are no longer only written to be executed on computers • E.g., computational models of biological pathways in systems biology • Need new “program” analysis tools • Validate models (e.g., pathway model produces only expected products) • Reason about models • How do these users work? Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Conclusion Extensible Inductive Shape Analysis precision demanding program analysis improved by novel user interaction Developer: Gets results corresponding to intuition Analysis: Focused on what’s important to the developer Cooperating Decompilers adapt program analyses to code end-users run Practical precise tools for better software! Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
What can inductive shape analysis do for you? http://xisa.cs.berkeley.edu
Intuition: Checkers and types • structDll { • intval; • Dll* prev; • Dll* next; • }; global specification (i.e., per data structure) more precise (typically) holds only in “steady-state” need generalization global specification (i.e., per data structure) less precise (typically) holds always doesn’t need generalization • l.sorteddll(prev, min) = • if (l =null) then • true • else • l!prev=prevand min ·l!valandl!next.sorteddll(l,l!val) x . sorteddll(…) x : Dll Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
… … Segments as partial checkers Summary i 0 i 0 dll(¯) ° ® ¯ ® c(°) c0(°0) dll(null) dll(¯) Instance null Checker “Run” ®.dll(null) c(®,°) i next next ¯.dll(®) null i prev … … i= 0 ® = ° ¯ = null prev °.dll(¯) c = c0 ® = ¯ ° = °0 i= 0 ±.dll(°) … c0(¯,°0) next ® ¯ ° ± null next null.dll(±) prev prev Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures