540 likes | 632 Vues
Explore the importance of source code analysis for security, discussing its benefits, limitations, and processes in ensuring code quality and integrity. Learn about different software bug finding methods and the significance of analyzing source code in today's software development landscape.
E N D
Source Analysis for Security Trent Jaeger March 29, 2004
Example 2 get_free_buffer(struct stripe_head *sh, …) { struct buffer_head *bh; unsigned long flags; save_flags(flags); cli(); if ((bh = sh->buffer_pool) == NULL) return NULL; sh->buffer_pool – bh->b_next; bh->b_size = b_size; restore_flags(flags); return bh; }
Example 4 int notify_change(struct dentry * dentry, struct iattr * attr) { struct inode *inode = dentry->d_inode; … if (inode->i_op && inode->i_op->setattr) { error = security_inode_setattr(dentry, attr); if (!error) error = inode->i_op->setattr(dentry, attr); … }
Find Software Bugs • Education • Difficult to know how code will be used • Testing • Misses many code paths, time consuming • Manual Inspection • Tedious and error prone • Compiler checking • Context independent • 4GL • Incomplete and don’t know how source code will be used • Assurance • Extremely costly and complex – what do we do about existing code?
Limited Source Code Analysis • Source code is the level security is defined • Problems manifest in errors in code (although design can be a problem too) • Compilers can check for various properties • Rules on program source • Programmers can express some properties • Semantic properties • Must specify correctly (no/few false negatives) • Must not be too conservative (few false positives) • Like to be robust with code changes
Source Code Analysis • Covert source code into a model • Convert property into a computation on model • Report positive cases (violate/meet property) • Determine if cases are true or false • Resolve true cases • Refine model or property and repeat
Some Properties • Never/always do X • Never use floating point in kernel • Do X rather than Y • Always do X before/after Y • LSM mediation (Example 1) • Never do X before/after Y • In situation X, do (not) Y • Re-enable disabled interrupts (Example 2) • In situation X, do Y rather than X
Program Models • Abstract Syntax Tree • Control flow • Data flow • Def-use chain • Aliases • Type constraints • …
Abstract Syntax Tree Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)
Control Flow (Interprocedural) Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)
Control Flow (Intraprocedural) Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)
Data Flow Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)
Def-Use Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)
Property Models • Finite State Automata • Start Operation • Disable Interrupts • Enable Interrupts • End Operation • Type Constraints • Unchecked type • Checked type • Expect checked type enable disable disable enable End Op Exit w/ disabled double_disable double_enable
CQUAL Static Analysis • CQUAL is a type-based static analysis tool from UC Berkeley • Enables qualification of types, analogous to const • Enables verification that the type passed to a function is the type expected • Used previously for verification of format string vulnerabilities • Wagner’s group at UC Berkeley in USENIX Security 2001
CQUAL Principles • Interprocedural control flow • do_fcntl calls fcntl_getlk • Def-Use data flow • Assignments tracked back to def where type is declared • Type inference • Variables have type restrictions • Cannot assign a variable to another of an incompatible type • Cannot send a variable as a parameter to a function unless its type is compatible
Sensitivity: Flow and Context • Flow-sensitivity • The order of statements in a function matters • CQUAL is not flow-sensitive • Must create new ‘checked’ variable • Must use GCC to verify intraprocedural paths • Must use GCC to find reassignments after ‘checked’ • Context-sensitivity • A function is treated differently depending on calling site • CQUAL is not context-sensitive • If two functions call the same descendant must have the same requirements in CQUAL
CQUAL Postscript • Flow-sensitive CQUAL • Initial performance was not good • Field level data flow • Extensions at UC Berkeley • We switched to new tool (JaBA) • Interprocedural control flow • Intraprocedural control flow (flow-sensitive) • Context-sensitive • Variable and field-level data flow • Replicated analyses of Example 1 and 3 while preventing false positives of Example 4
Meta-compilation • Compilers • Have program source • Can implement straightforward rules for source checking • Lack domain semantics of programs • Programmers • Have domain semantics of programs • Need a means to express these semantics such that they can be checked
Meta-compilation • Model • GCC abstract syntax tree • Compute interprocedural control flow graph • Compute intraprocedural control flow graph • Properties • Finite state automata • Generate extensions from specification • Computation • FSA state transitions are represented by patterns • Find syntactic patterns in code • Build intraprocedural paths with relevant state changes • For each path, compute resultant state transitions
Properties: Meta Language (metal) • { #include “linux-includes.h” } • sm check_interrupts { • // Variables used in patterns • decl { unsigned } flags; • // Patterns to specify enable/disable fns • pat enable = { sti(); } • | { restore_flags(flags); } ; • pat disable = { cli() }; • // States – implicit initial state • is_enabled: disable is_disabled • enable { err(“double enable”); } ; • is_disabled: disable { err(“double disable”); } • | $end of path$ { err(“exiting w/ intr disabled”); } enable disable disable enable End Op Exit w/ disabled double_disable double_enable
Example 2 Processing get_free_buffer(struct stripe_head *sh, …) { struct buffer_head *bh; unsigned long flags; save_flags(flags); cli(); if ((bh = sh->buffer_pool) == NULL) return NULL; sh->buffer_pool – bh->b_next; bh->b_size = b_size; restore_flags(flags); return bh; } disable end of path err enable end of path
Meta-Compilation System • Compile Metal State Machine (SM) with mcc • Dynamically link SM into xg++ • Compile-time, command line flag • It is “pushed down” “both paths” • Paths are built and checked against SM • All paths vs one pass (flow-sensitive vs. insensitive) • Prune paths that reach join in same state • Fixed point: loop until reach all possible paths
Prune Paths Choice of paths does not matter, so only one needs to be kept disable enable
Assertion Checking – Side Effects • { #include “linux-includes.h” } • sm Assert flow-insensitive { • // Match expressions • decl { any } expr, x, y, z; • decl { any_call } any_fcall; • decl { any_args } args; • // States: find asserts and detect side effects • start: { assert(expr); } • {mgk_expr_recurse(expr, in_assert); } ; • in_assert: { any_fcall(args) } { err(“fn call”); } • | { x = y } { err(“assignment”); } • | { z++ } { err(“post-increment”); } • | { z-- } { err(“post-decrement”); }
xgcc Extension (PLDI 2002) • Match patterns to statements • Identify state transitions • Compute intraprocedural paths • Prune those that cannot matter (no state changes) • Combine intraprocedural paths into complete paths • Analysis instance based on a transition from a start state • Paths are generated for each instance • Assignments result in creating a new instance that is a copy
Checking memory management allocation unknown Conditional check on ptr implying not null Conditional check on ptr implying null free, dereference dereference null not-null end path overwrite free, dereference free free freed stop
Checking memory management • Intraprocedural control flow • Distinguish between paths with null and non-null pointers • Interprocedural control flow • “Global analysis” done in PLDI by combining intraprocedural paths • Data flow • None, pure syntactic comparison • Assignment does result in replication of state machine for assigned variable • Finds bugs, but does not guarantee absence • No track of assignment to a structure field • No Aliases • False positives • Syntactic path-sensitivity keeps them moderate
Other Example Analyses • Example 3 – (check fcntl and set_fowner) • If we know the required authorizations for each operation, we can define the states of these ops • Don’t know this (tedious to specify) • We use a consistency analysis (ACM TISSEC, May 2004) • Example 4 – (distinguish between dentryinode and inode) • Specify that { inode = dentryinode } links inode state with dentry state • Note that this does not compute from 1st principles, so manual effort is required to ensure it is correct
xgcc Postscript • Lots of papers on finding bugs using these techniques • Lots of simple errors in code • Other aspects • Automating annotation • Statistical analysis • Coverity, Inc.
GCC Architecture • Compilers for C, C++, Java • Consists of a sequence of compilation steps all of which can be hooked (3.0 and greater) • Eventually, has a single representation of all (gimple) • Then converts to Register Transfer Language (RTL) at which point all typing is lost
MOPS • Aim to provide a ‘sound’ analysis architecture • That is, no false negatives for their model • Program model • Pushdown automata of program • Property model • Finite state automata of security property • Temporal properties • Like xgcc, there is no real data flow analysis • Unlike xgcc, language for properties is not defined
Formal Basis • FSA M accepts a language of security property violations B • All operation sequences that obey M violate security property • PDA P accepts all feasible program traces T • Traces are interprocedural combination of intraprocedural control flow paths • Note that traces are control flow representation • Problem: Decide if any trace violates security property • As whether T 3 B = null • Represented by L(M) 3 L(P) = null • Intersection of PDA and FSA can be computed efficiently • Note that T` L(P), so some infeasible traces are in L(P)
Example 2 enable get_free_buffer(struct stripe_head *sh, …) { struct buffer_head *bh; unsigned long flags; save_flags(flags); cli(); if ((bh = sh->buffer_pool) == NULL) return NULL; sh->buffer_pool – bh->b_next; bh->b_size = b_size; restore_flags(flags); return bh; } disable disable enable End Op Exit w/ disabled double_disable double_enable
assign zero, free check assign use use unmediated Unassigned Use Example 1 assign check use unmediated
MOPS Distinguishing Features • Modularity • Can create a hierarchy of FSAs • Haven’t seen this used… • Pattern variables • “bound to any expression that satisfies context constraints” • Difference from xgcc patterns? • Modeling • PDA and FSA a combined into a composite PDA that accepts L(M) 3 L(P) • Can determine all the FSA states that an instruction can be executed in
Modeling OS for MOPS • Find all kernel variables that affect security • Done manually • Determine the states in the FSA for each • Done manually • Determine transitions between states • Transition in FSA • Automated state space explorer • Execute all paths and create transitions automatically
Setuid • Variable euid determines privilege • Euid can be modified by several functions: • setuid, seteuid, setreuid, setresuid • Value of euid depends on value of other variables on input to these system calls • ruid, suid • cap_effective, cap_permitted • Are found manually • Transitions indicate system calls that lead to changes in variables