Download
evaluating static analysis tools n.
Skip this Video
Loading SlideShow in 5 Seconds..
Evaluating Static Analysis Tools PowerPoint Presentation
Download Presentation
Evaluating Static Analysis Tools

Evaluating Static Analysis Tools

178 Vues Download Presentation
Télécharger la présentation

Evaluating Static Analysis Tools

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Evaluating Static Analysis Tools Dr. Paul E. Black paul.black@nist.gov http://samate.nist.gov/

  2. Static Analysis Examine code Handles unfinished code Can find backdoors, eg, full access for user name “JoshuaCaleb ” Potentially complete Dynamic Analysis Run code Code not needed, eg, embedded systems Has few(er) assumptions Covers end-to-end or system tests Static and Dynamic Analysis Complement Each Other

  3. Different Static Analyzers Are Used For Different Purposes • To check intellectual property violation • By developers to decide if anything needs to be fixed (and learn better practices) • By auditors or reviewer to decide if it is good enough for use

  4. Syntactic Heuristic Analytic Formal Dimensions of Static Analysis Application (explicit) • Analysis can look for general or application-specific properties • Analysis can be on source code, byte code, or binary • The level of rigor can vary from syntactic to fully formal. Properties Source Code Byte code General (implicit) Binary Level of Rigor

  5. SATE 2008 Overview • Static Analysis Tool Exposition (SATE) goals: • Enable empirical research based on large test sets • Encourage improvement of tools • Speed adoption of tools by objectively demonstrating their use on real software • NOT to choose the “best” tool • Co-funded by NIST and DHS, Nat’l Cyber Security Division • Participants: • Aspect Security ASC  HP DevInspect • Checkmarx CxSuite  SofCheck Inspector for Java • Flawfinder  UMD FindBugs • Fortify SCA  Veracode SecurityReview • Grammatech CodeSonar

  6. SATE 2008 Events • Telecons, etc. to come up with procedures and goals • We chose 6 C & Java programs with security implications and gave them to tool makers (15 Feb) • Tool makers ran tools and returned reports (29 Feb) • We analyzed reports - (tried to) find “ground truth” (15 Apr) • We expected a few thousand warnings - we got over 48,000. • Critique and update rounds with some tool makers (13 May) • Everyone shared observations at a workshop (12 June) • We released our final report and all data 30 June 2009 http://samate.nist.gov/index.php/SATE.html

  7. SATE 2008: There’s No Such Thing as “One Weakness” • Only 1/8 to 1/3 of weaknesses are simple. • The notion breaks down when • weakness classes are related and • data or control flows are intermingled. • Even “location” is nebulous.

  8. Improper Input Validation CWE-20 Command Injection CWE-77 Cross-Site Scripting CWE-79 Validate- Before-Canonicalize CWE-180 Relative Path Traversal CWE-23 Predictability CWE-340 Container Errors CWE-216 Symlink Following CWE-61 Race Conditions CWE-362 Permissions CWE-275 How Weakness Classes Relate • Hierarchy • Chains lang = %2e./%2e./%2e/etc/passwd%00 • Composites • from “Chains and Composites”,Steve Christey, MITREhttp://cwe.mitre.org/data/reports/chains_and_composites.html

  9. Intermingled Flow:2 sources, 2 sinks, 4 pathsHow many weakness sites? free line 1503 free line 2644 use line 808 use line 819

  10. Other Observations • Tools can’t catch everything: cleartext transmission, unimplemented features, improper access control, … • Tools catch real problems: XSS, buffer overflow, cross-site request forgery - 13 of SANS Top 25 (21 with related CWEs) • Tools reported some 200 different kinds of weaknesses • Buffer errors still very frequent in C • Many XSS errors in Java • “Raw” report rates vary by 3x depending on code • Tools are even more helpful when “tuned” • Coding without security in mind leaves MANY weaknesses

  11. Current Source Code Security Analyzers Have Little Overlap Non-overlap: Hits reported by one tool and no others (84%) Overlap: Hits reported by more than one tool (16%) 2 tools 3 tools 4 tools All 5 tools from MITRE

  12. Reports Everything 100 80 60 40 20 Misses Everything 0 60 0 40 20 80 100 Precision & Recall Scoring The Perfect Tool Finds all flaws and finds only flaws Finds more flaws “Better” Finds mostly flaws All True Positives No True Positives from DoD

  13. Reports Everything 100 80 60 40 20 Misses Everything 0 60 0 40 20 80 100 Tool A Use after free TOCTOU Tainted data/Unvalidated user input Memory leak All flaw types Uninitialized variable use Null pointer dereference Buffer overflow Improper return value use All True Positives No True Positives from DoD

  14. Reports Everything 100 80 60 40 20 Misses Everything 0 60 0 40 20 80 100 Tool B Command injection Tainted data/Unvalidated user input Format string vulnerability Improper return value use Use after free Buffer overflow TOCTOU All flaw types Uninitialized variable use Memory leak Null pointer dereference All True Positives No True Positives from DoD

  15. Reports Everything 100 80 60 40 20 Misses Everything 0 60 0 40 20 80 100 Best Tool Format string vulnerability Tainted data/Unvalidated user input Command injection Improper return value use Buffer overflow Null pointer dereference Use after free TOCTOU Memory leak Uninitialized variable use All True Positives No True Positives from DoD

  16. Tools Useful in Quality “Plains” • Tools alone are not enough to achieve the highest “peaks” of quality. • In the “plains” of typical quality, tools can help. • If code is adrift in a “sea” of chaos, train developers. Tararua mountains and the Horowhenua region, New Zealand Swazi Apparel Limited www.swazi.co.nz used with permission

  17. Tips on Tool Evaluation • Start with many examples covering code complexities and weaknesses SAMATE Reference Dataset (SRD) http://samate.nist.gov/SRD Many cases from MIT: Lippmann, Zitser, Leek, Kratkiewicz • Add some of your typical code. • Look for • Weakness types (CWEs) reported • Code complexities handled • Traces, explanations, and other analyst support • Integration and machine-readable reports • Ability to write rules and ignore “known good” code • False alarm ratio (fp/tp) is a poor measure. Report density (r/kLoc) is probably better.