320 likes | 426 Vues
Static Code Checking: Security and Concurrency. Ben Watson The George Washington University CS 297 Security and Programming Languages June 9, 2005. The Video. The Problem. How to discover errors in code without running it Code can run for weeks or months without displaying the error
E N D
Static Code Checking: Security and Concurrency Ben Watson The George Washington University CS 297 Security and Programming Languages June 9, 2005
The Problem • How to discover errors in code without running it • Code can run for weeks or months without displaying the error • Many errors are caused by pieces of code that are very difficult to test • Device drivers – manufacturers aren’t always good at this, and one OS company can’t possibly test all the tens of thousands of devices out there • The Windows 98 crash was caused by a bad scanner driver • Concurrent code—debugging complicated concurrency problems is a nightmare x n.
The Scope • Lines of Code (estimated)
The Real Problem • We’re only human • No person, no group of people can possibly manually debug anything as complicated as an OS and its related pieces • Good tools are not enough • Can’t rely on thorough annotations of entire code base • Can’t rely on manual directions: the more automated the better
The Solutions • MC Security checking system • RacerX: Race condition and Deadlock detection • General rule inference from source code
MECA: Statically Checking Security Properties • Checks low-level properties (pointer safety, etc.) • Relies on annotations that propagate through the analysis • Goals • Expressiveness • Low manual overhead—programmers only have to type in a relatively few number of annotations • Low false-positives
How MC Works • Uses a modified GCC compiler • Parses source along with abstract syntax tree generated by compiler • AST used to build a control-flow graph • Annotation propagator uses CFG to propagate annotations through entire graph • Checkers are run on the completed graph • Results are ranked and filtered
An example • Rule: OS kernel may not access a user-pointer (there are “paranoid” functions to access the data pointed to by a user-pointer) • Referred to as a “tainted” pointers • Annotate: • Tainted variables, parameters, and fields • Functions that produce tainted values
Source annotations struct myStruct { /*@ tainted */ int*p; }; /*@ tainted */ int *foo(/*@ tainted */int *p); void memcpy(/*@ !tainted */void *dst, /*@ !tainted */void *src, unsigned nbytes);
Source annotations //Binding: /*@ set_length($ret, sz) */ void* malloc(unsigned sz); //Global: all sys_* calls //are tainted /*@ global $param ${!strncmp(current_fn,”sys_”,4)} ==> tainted */
Propagation void bar(/*@ tainted */void *p); struct S{char* buf;} //Before analysis void foo(char** p, struct S* s) { char *r; struct S* ss; r=*p; bar(r); //taints r and *p ss =s; bar(ss->buf); //taints ss and s } //At the end of analysis: Foo(/*@ tainted (*p) */char **p, /*@tainted(s->buf) */struct S* s);s
MECA results • On average, one manual annotation led to 682 checks • Linux 2.5.63 Bugs:
RacerX • Static detection of race conditions and deadlocks • Designed to find errors in large, multi-threaded systems • Sorts errors by severity (the hard part) • They checked Linux, FreeBSD, and a mystery OS that has only 500,000 lines of code
Deadlock • Deadlock • Thread 1 has locked resource A • Thread 2 has locked resource B • Thread 1 needs resource B to complete • Thread 2 needs resource A to complete • Neither can proceed—these threads are deadlocked
Race condition • Multiple threads access the same memory • If memory is unprotected: • Two threads can simultaneously write to same memory (bad) • One thread can read, another can write simultaneously (bad) • Two threads can simultaneously read from same memory (probably ok) • It’s a race because final value is non-deterministically chosen by who gets there first.
Avoiding the Problem • If data is never accessed by more than one thread, you don’t have to worry about concurrency • If program logicensures that only one thread accesses data, you don’t need to worry about locking the data • If you’re writing a shared component, you almost always have to worry about concurrency
Algorithm • “Lockset” algorithm detects both types of problems • Lockset - A pair of • Lock()/Unlock() • InterruptDisable()/InterruptEnable() • Etc.
Algorithm • Top-down analysis of control-flow graph • Add/remove locks as needed • Check for race/deadlock on each statement • Cache results to ease exponential graph size
Deadlock Check • Basically, finds if there are cycles in the lockset dependencies • If lock a is obtained, then lock b, we have: • a b • Following this line of reasoning, we can discover cases that look like this: • a b c a
Deadlock Check • Deciding how important the cycle is, is non-trivial. • Basically, rank higher according to: • Global locks vs. local locks • Small depth difference vs. big depth difference • Fewer threads vs. more threads
Race Checking • This is even harder than deadlock detection • Must answer: • Is lockset valid (if not, you will have LOTS of false positives) • Can the unprotected memory be accessed more than one thread? • Does the access need to be protected? • Two reads do not a wrong make • Must annotate API functions that require locks
Race Checking • Deciding if code is multithreaded: • Inferred from “programmer belief” – if a piece of code contains concurrency-related statements, the code is probably multi-threaded • Annotations—designate API functions as requiring locks
Race Checking • Does memory need to be protected? • If it’s never written to, no. • If it’s only written on initialization, no. • On a certain code path, if there are a high-number of variables that are potentially written to concurrently, probably. • Anything that can’t be written atomically, yes. (although, this is pretty much anything, especially if you have more than 1 CPU) • If a variable is statistically likely to be protected by locking code (“Programmer Belief”)
Pop Quiz – Question 1 • If you have read the 3rd paper, you may not answer this question. • Find the bug: if (card==NULL) { printk(KERN_ERR “capidrv-%d: … %d!\n”, card->contrnr, id); }
Pop Quiz – Answer 1 if (card==NULL) { printk(KERN_ERR “capidrv-%d: … %d!\n”, card->contrnr, id); }
Pop Quiz – Question 2 • If you have read the 3rd paper, you may not answer this question. • Find the bug: struct mxser_struct *info = tty->driver_data; unsigned long flags; if (!tty || !info->xmit_buf) return 0;
Pop Quiz – Answer 2 struct mxser_struct *info = tty->driver_data; unsigned long flags; if (!tty || !info->xmit_buf) return 0;
General Methodology • Take advantage of programmer beliefs • Statistics are our friend • If something is usually done a certain way, then instances that violate that should be examined • Check internal consistency • Discover rules that are built-in to the code • Minimal to no annotation
Conclusion • The methods tonight provide some of the best ways to find errors: • Millions of lines of code can be checked with at most hundreds of lines of annotations • The bugs these methods find are fairly specific in nature (revolve around well-structured code constructs)
References • Junfeng Yang, Ted Kremenek, Yichen Xie, and Dawson Engler. MECA: an Extensible, Expressive System and Language for Statically Checking Security Properties. ACM CCS, 2003. • Dawson Engler and Ken Ashcraft. RacerX: Effective, Static Detection of Race Conditions and Deadlocks. SOSP 2003. • Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. OSDI 2000. • Source Lines of Code, http://www.answers.com/topic/source-lines-of-code • Concurrency – Part 2: Avoiding the Problem, http://blogs.msdn.com/larryosterman/archive/2005/02/15/373460.aspx