Source Code Analysis for Security

Source Analysis for Security Trent Jaeger March 29, 2004

Example 1

Example 2 get_free_buffer(struct stripe_head *sh, …) { struct buffer_head *bh; unsigned long flags; save_flags(flags); cli(); if ((bh = sh->buffer_pool) == NULL) return NULL; sh->buffer_pool – bh->b_next; bh->b_size = b_size; restore_flags(flags); return bh; }

Example 3

Example 3 (con’t)

Example 4 int notify_change(struct dentry * dentry, struct iattr * attr) { struct inode *inode = dentry->d_inode; … if (inode->i_op && inode->i_op->setattr) { error = security_inode_setattr(dentry, attr); if (!error) error = inode->i_op->setattr(dentry, attr); … }

Find Software Bugs • Education • Difficult to know how code will be used • Testing • Misses many code paths, time consuming • Manual Inspection • Tedious and error prone • Compiler checking • Context independent • 4GL • Incomplete and don’t know how source code will be used • Assurance • Extremely costly and complex – what do we do about existing code?

Limited Source Code Analysis • Source code is the level security is defined • Problems manifest in errors in code (although design can be a problem too) • Compilers can check for various properties • Rules on program source • Programmers can express some properties • Semantic properties • Must specify correctly (no/few false negatives) • Must not be too conservative (few false positives) • Like to be robust with code changes

Source Code Analysis • Covert source code into a model • Convert property into a computation on model • Report positive cases (violate/meet property) • Determine if cases are true or false • Resolve true cases • Refine model or property and repeat

Some Properties • Never/always do X • Never use floating point in kernel • Do X rather than Y • Always do X before/after Y • LSM mediation (Example 1) • Never do X before/after Y • In situation X, do (not) Y • Re-enable disabled interrupts (Example 2) • In situation X, do Y rather than X

Program Models • Abstract Syntax Tree • Control flow • Data flow • Def-use chain • Aliases • Type constraints • …

Abstract Syntax Tree Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)

Control Flow (Interprocedural) Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)

Control Flow (Intraprocedural) Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)

Data Flow Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)

Def-Use Func_decl Sys_fcntl var_decl Struct file *filp Expr_stmt = Expr_stmt = call_decl do_fcntl Var_decl filp call_decl Fget(fd) Var_decl err Cmpd_stmt Security_op Func_decl Do_fcntl Func_decl Fcntl_setlk Expr_stmt = var_decl Struct file *filp Expr_stmt = cmpd_stmt Use filp Var_decl err Call_stmt Fcntl_setlk(fd) Var_decl filp call_decl Fget(fd)

Property Models • Finite State Automata • Start Operation • Disable Interrupts • Enable Interrupts • End Operation • Type Constraints • Unchecked type • Checked type • Expect checked type enable disable disable enable End Op Exit w/ disabled double_disable double_enable

CQUAL Static Analysis • CQUAL is a type-based static analysis tool from UC Berkeley • Enables qualification of types, analogous to const • Enables verification that the type passed to a function is the type expected • Used previously for verification of format string vulnerabilities • Wagner’s group at UC Berkeley in USENIX Security 2001

CQUAL Principles • Interprocedural control flow • do_fcntl calls fcntl_getlk • Def-Use data flow • Assignments tracked back to def where type is declared • Type inference • Variables have type restrictions • Cannot assign a variable to another of an incompatible type • Cannot send a variable as a parameter to a function unless its type is compatible

CQUAL Approach

Identify Declarations

Identify Controlled Params

Create “Checked” Variable

Verify Local Controlled Ops

Find Assignments to ‘Checked’

Verify Interprocedural Paths

Find Example 1 Error

Sensitivity: Flow and Context • Flow-sensitivity • The order of statements in a function matters • CQUAL is not flow-sensitive • Must create new ‘checked’ variable • Must use GCC to verify intraprocedural paths • Must use GCC to find reassignments after ‘checked’ • Context-sensitivity • A function is treated differently depending on calling site • CQUAL is not context-sensitive • If two functions call the same descendant must have the same requirements in CQUAL

CQUAL Postscript • Flow-sensitive CQUAL • Initial performance was not good • Field level data flow • Extensions at UC Berkeley • We switched to new tool (JaBA) • Interprocedural control flow • Intraprocedural control flow (flow-sensitive) • Context-sensitive • Variable and field-level data flow • Replicated analyses of Example 1 and 3 while preventing false positives of Example 4

Meta-compilation • Compilers • Have program source • Can implement straightforward rules for source checking • Lack domain semantics of programs • Programmers • Have domain semantics of programs • Need a means to express these semantics such that they can be checked

Meta-compilation • Model • GCC abstract syntax tree • Compute interprocedural control flow graph • Compute intraprocedural control flow graph • Properties • Finite state automata • Generate extensions from specification • Computation • FSA state transitions are represented by patterns • Find syntactic patterns in code • Build intraprocedural paths with relevant state changes • For each path, compute resultant state transitions

Properties: Meta Language (metal) • { #include “linux-includes.h” } • sm check_interrupts { • // Variables used in patterns • decl { unsigned } flags; • // Patterns to specify enable/disable fns • pat enable = { sti(); } • | { restore_flags(flags); } ; • pat disable = { cli() }; • // States – implicit initial state • is_enabled: disable  is_disabled • enable  { err(“double enable”); } ; • is_disabled: disable  { err(“double disable”); } • | $end of path$  { err(“exiting w/ intr disabled”); } enable disable disable enable End Op Exit w/ disabled double_disable double_enable

Example 2 Processing get_free_buffer(struct stripe_head *sh, …) { struct buffer_head *bh; unsigned long flags; save_flags(flags); cli(); if ((bh = sh->buffer_pool) == NULL) return NULL; sh->buffer_pool – bh->b_next; bh->b_size = b_size; restore_flags(flags); return bh; } disable end of path  err enable end of path

Meta-Compilation System • Compile Metal State Machine (SM) with mcc • Dynamically link SM into xg++ • Compile-time, command line flag • It is “pushed down” “both paths” • Paths are built and checked against SM • All paths vs one pass (flow-sensitive vs. insensitive) • Prune paths that reach join in same state • Fixed point: loop until reach all possible paths

Prune Paths Choice of paths does not matter, so only one needs to be kept disable enable

Assertion Checking – Side Effects • { #include “linux-includes.h” } • sm Assert flow-insensitive { • // Match expressions • decl { any } expr, x, y, z; • decl { any_call } any_fcall; • decl { any_args } args; • // States: find asserts and detect side effects • start: { assert(expr); }  • {mgk_expr_recurse(expr, in_assert); } ; • in_assert: { any_fcall(args) }  { err(“fn call”); } • | { x = y }  { err(“assignment”); } • | { z++ }  { err(“post-increment”); } • | { z-- }  { err(“post-decrement”); }

xgcc Extension (PLDI 2002) • Match patterns to statements • Identify state transitions • Compute intraprocedural paths • Prune those that cannot matter (no state changes) • Combine intraprocedural paths into complete paths • Analysis instance based on a transition from a start state • Paths are generated for each instance • Assignments result in creating a new instance that is a copy

Checking memory management allocation unknown Conditional check on ptr implying not null Conditional check on ptr implying null free, dereference dereference null not-null end path overwrite free, dereference free free freed stop

Checking memory management • Intraprocedural control flow • Distinguish between paths with null and non-null pointers • Interprocedural control flow • “Global analysis” done in PLDI by combining intraprocedural paths • Data flow • None, pure syntactic comparison • Assignment does result in replication of state machine for assigned variable • Finds bugs, but does not guarantee absence • No track of assignment to a structure field • No Aliases • False positives • Syntactic path-sensitivity keeps them moderate

Other Example Analyses • Example 3 – (check fcntl and set_fowner) • If we know the required authorizations for each operation, we can define the states of these ops • Don’t know this (tedious to specify) • We use a consistency analysis (ACM TISSEC, May 2004) • Example 4 – (distinguish between dentryinode and inode) • Specify that { inode = dentryinode } links inode state with dentry state • Note that this does not compute from 1st principles, so manual effort is required to ensure it is correct

xgcc Postscript • Lots of papers on finding bugs using these techniques • Lots of simple errors in code • Other aspects • Automating annotation • Statistical analysis • Coverity, Inc.

GCC Architecture • Compilers for C, C++, Java • Consists of a sequence of compilation steps all of which can be hooked (3.0 and greater) • Eventually, has a single representation of all (gimple) • Then converts to Register Transfer Language (RTL) at which point all typing is lost

MOPS • Aim to provide a ‘sound’ analysis architecture • That is, no false negatives for their model • Program model • Pushdown automata of program • Property model • Finite state automata of security property • Temporal properties • Like xgcc, there is no real data flow analysis • Unlike xgcc, language for properties is not defined

Formal Basis • FSA M accepts a language of security property violations B • All operation sequences that obey M violate security property • PDA P accepts all feasible program traces T • Traces are interprocedural combination of intraprocedural control flow paths • Note that traces are control flow representation • Problem: Decide if any trace violates security property • As whether T 3 B = null • Represented by L(M) 3 L(P) = null • Intersection of PDA and FSA can be computed efficiently • Note that T` L(P), so some infeasible traces are in L(P)

Example 2 enable get_free_buffer(struct stripe_head *sh, …) { struct buffer_head *bh; unsigned long flags; save_flags(flags); cli(); if ((bh = sh->buffer_pool) == NULL) return NULL; sh->buffer_pool – bh->b_next; bh->b_size = b_size; restore_flags(flags); return bh; } disable disable enable End Op Exit w/ disabled double_disable double_enable

assign zero, free check assign use use unmediated Unassigned Use Example 1 assign check use unmediated

MOPS Distinguishing Features • Modularity • Can create a hierarchy of FSAs • Haven’t seen this used… • Pattern variables • “bound to any expression that satisfies context constraints” • Difference from xgcc patterns? • Modeling • PDA and FSA a combined into a composite PDA that accepts L(M) 3 L(P) • Can determine all the FSA states that an instruction can be executed in

Modeling OS for MOPS • Find all kernel variables that affect security • Done manually • Determine the states in the FSA for each • Done manually • Determine transitions between states • Transition in FSA • Automated state space explorer • Execute all paths and create transitions automatically

Setuid • Variable euid determines privilege • Euid can be modified by several functions: • setuid, seteuid, setreuid, setresuid • Value of euid depends on value of other variables on input to these system calls • ruid, suid • cap_effective, cap_permitted • Are found manually • Transitions indicate system calls that lead to changes in variables

Source Code Analysis for Security

Source Code Analysis for Security

Presentation Transcript

Finding Security Violations by Using Precise Source-level Analysis

Reference Source Analysis

Finding Security Violations by Using Precise Source-level Analysis

Security Analysis

Source Code Analysis

Program Analysis for Security

Source Analysis

Program Analysis for Security

Security Analysis

Program Analysis for Security

Static Analysis for Security

Static analysis for security

Open Source Security Tools

Security Analysis/Design for UML

Static Analysis for Security

Security Guard, your go-to source for security

Open Source Tools for Data Analysis

Finding Security Violations by Using Precise Source-level Analysis

Open source tools for data analysis