Post-Attack Analysis of Unknown Vulnerabilities

Post-Attack Analysis of Unknown Vulnerabilities Peng Ning With Emre C. Sezer, Chongkyung Kil, and Jun Xu

Motivation • Vulnerability analysis • Essential for • Patching • Vulnerability based signature generation • Painstakingly slow • Depends on human efforts • Existing approaches • Static analysis (e.g., [Chen et al. 04] , [Feng et al. 04], [Larochelle & Evans 01]) • False positives • Dynamic analysis (e.g., Minos [Crandall et al. 04], TaintCheck [Newsome & Song 05], DIRA [Smirnov & Chiueh 05]) • Used for detection; inadequate vulnerability information • Symbolic execution (e.g., Exe [Cadar et al. 06], DACODA [Crandall et al. 05]) • Scalability issues • Recovery (e.g., STEM [Sidiroglou et al. 05], SEAD [Lacosto et al. 07]) • Change of application semantics 2007 GMU-CSA Workshop

MemSherlock • MemSherlock is an automateddebugger • Automated analysis of unknownmemory corruption vulnerabilities • Appeared in ACM CCS ’07 • MemSherlock provides • Statement that causes the memory corruption • Dynamic program slice leading to the corruption • Program variables involved in the vulnerability • All presented at programming language level • Implications • Generating vulnerability conditions • Improves signature or patch generation speed 2007 GMU-CSA Workshop

Light-weight IDS MemSherlock Trigger Program Instrumented Program Logger Replayer General Framework: Web Application Example Traffic 2007 GMU-CSA Workshop

MemSherlock Overview • Goal is to provide vulnerability information • Intuitive, easy to understand for the programmer • Not only the corruption point • Slice of program involved in the vulnerability • Effects of user inputs • Program variables involved • Variable relationships (e.g., pointer aliasing) • Type of vulnerability (e.g., stack buffer overflow) • MemSherlock performs two important tasks • Finding the corruption point • Tracking program state 2007 GMU-CSA Workshop

MemSherlock: Finding Corruption Point • Observation: A memory object is modified by a small set of statements (inspired by AccMon) • For memory object m, write set of mis the set of statements that legitimately modify m, WS(m) • Security Condition:Memory object m should only be updated by statements in WS(m) 2007 GMU-CSA Workshop

MemSherlock: Assembly Line • Pre-Debugging Phase • Instruments the program for debugging phase • Extracts program information via static analysis • Needs to be performed once • Debugging Phase • Tracks program state • Monitors memory writes and checks for violation of security condition • Tracks tainted data and its propagation 2007 GMU-CSA Workshop

MemSherlock Architecture 2007 GMU-CSA Workshop

Pre-debugging: Generating Write Sets • MemSherlock analyses source code to determine write sets • For a program variable v, WS(v) includes • Assignment statements (i.e., v=expr) • Library function calls where v is passed as an argument that can be modified (i.e., memcpy(&v,src)) • MemSherlock treats DLLs as black boxes • Assumption: A DLL is internally secure, but externally insecure • e.g., no stack overflows in the library functions • Sound for common, well tested libraries (e.g., clib) • Requires library specifications • For each DLL, a list of functions and the arguments they might modify 2007 GMU-CSA Workshop

Dealing with Pointers • For a pointer variable p two write sets are kept • WS(p) – Statements that modify p • WS(ref(p)) – Statements that modify the referent (e.g., *p=5) • ref(p) is resolved during runtime (debugging) • Perform the same analysis for pointer-type function arguments at function calls • Removes the requirement for inter-procedural static analysis 2007 GMU-CSA Workshop

Chained Dereferences • Earlier technique can only handle simple dereferences • Source code rewriting is used to convert all chained dereferences to simple dereferences • Any other dereference that is not simple is converted in the same manner 2007 GMU-CSA Workshop

Output of Pre-debugging Phase • Simplified program • Simplified pointer dereferences • Compiled with debugging options • Input file for the debugger • Program variables and their write sets • Addresses of global symbols • Frame pointer offsets of local variables • Other flags that help the debugger 2007 GMU-CSA Workshop

MemSherlock Architecture: Debugging 2007 GMU-CSA Workshop

Debugging: Dynamic Monitoring • Runtime monitoring • State Maintenance • Incorporates taint analysis from TaintCheck • Produces a dynamic slice of the program leading to the vulnerability • Write Checking • Monitors and validates memory writes • Write sets are file name and line number pairs <f,l> • Instruction pointer IP is translated into <f,l> • Write sets are associated with program variables • A destination address is translated into a program variable 2007 GMU-CSA Workshop

Keeping Program State Virtual Address Space Stack base Stack base main main fnc A fnc A Memory write 0xABABABAB fnc B fnc C Memory write 0xABABABAB Program State 1 Program State 2 • A given memory region may correspond to different program variables depending on program state • Dynamic monitor keeps track of memory mapping 2007 GMU-CSA Workshop

Debugging: Key Data Structures • Keeps two lists of memory regions • ActiveMemoryRegions • Memory corresponding to program variables or their referent memory regions • NonWritableRegions • Saved registers, return addresses, metadata encapsulating dynamically allocated memory regions 2007 GMU-CSA Workshop

Debugging: State Maintenance • Function calls/returns (memory) • Local variable addresses are calculated and added to ActiveMemoryRegions • Location of return address and saved registers are added to NonWritableRegions list • Heap memory (memory) • malloc/free calls are intercepted • Allocated memory is added to ActiveMemoryRegions • The metadata encapsulating the buffer is added to NonWritableRegions • Pointer value updates (write sets) • Searches ActiveMemoryRegions to find the referent and updates its WS 2007 GMU-CSA Workshop

Debugging: Write Checking • When instruction IP modifies memory m • if m is in ActiveMemoryRegions • determines the variable v it belongs to • converts IP into <f,l> • checks if <f,l> is in WS(v) • If the memory write check fails or m is in NonWritableRegions • Marks the operation as a memory corruption • Displays the vulnerability information 2007 GMU-CSA Workshop

Generating Vulnerability Information • The slice of program contributing to the vulnerability • Statements that have propagated tainted values • Statements that have modified related memory regions • Dependency between memory objects involved in the vulnerability • Points to analysis shows memory regions and how they were accessed • Program state • Call stack information • Write set information 2007 GMU-CSA Workshop

Example Test Case: Null HTTP • ~~http.c~~ • 91: void ReadPOSTData(int sid) { • … • 100: conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024, sizeof(char)); • 101: if (conn[sid].PostData==NULL) { ... • 107: do { • 108: rc=recv(conn[sid].socket, pPostData, 1024, 0); • 109: … • Error Report: • --20361-- Error type: Heap Buffer Overflow • --20361-- Dest Addr: 3AB3E360 • --20361-- IP: 0x804E5C7: ReadPOSTData (http.c:108) • --20361-- Dest address resolved to: • --20361-- Global variable "heap var" • @ 3AB3E280 (size: 224) • --20361-- • --20361-- Memory allocated by 0x804E531: • ReadPOSTData (http.c:100) • --20361-- TAINTED destination 3AB3E360 • --20361-- Fully tainted from: • --20361-- 0x804E5C7: ReadPOSTData (http.c:108) • --20361-- • --20361-- TAINTED size used during allocation • --20361-- Tainted from: • --20361-- 0x804E456: ReadPOSTData (http.c:100) • --20361-- 0x804FBB5: read_header (http.c:153) • --20361-- 0x805121B: sgets (server.c:211) 2007 GMU-CSA Workshop

Vulnerability Analysis Example ~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData; ... 100: conn[sid].PostData=calloc( conn[sid].dat->in_ContentLength+1024, sizeof(char)); ... 107: do { 108: rc=recv(conn[sid].socket, pPostData, 1024, 0); ... Create Heap Object 2007 GMU-CSA Workshop

Vulnerability Analysis Example ~~http.c:~~ 119: int read_header(int sid) { 121: char line[2048]; ... 127: do { 128: memset(line, 0, sizeof(line)); 129: sgets(line, sizeof(line)-1, conn[sid].socket); ... 153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ... 169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { 170: ReadPOSTData(sid); Object Taint ~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData; ... 100: conn[sid].PostData=calloc( conn[sid].dat->in_ContentLength+1024, sizeof(char)); ... 107: do { 108: rc=recv(conn[sid].socket, pPostData, 1024, 0); ... Object Use 2007 GMU-CSA Workshop

Vulnerability Analysis Example ~~http.c:~~ 119: int read_header(int sid) { 121: char line[2048]; ... 127: do { 128: memset(line, 0, sizeof(line)); 129: sgets(line, sizeof(line)-1, conn[sid].socket); ... 153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ... 169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { 170: ReadPOSTData(sid); Create ~~server.c~~ 202: int sgets(char *buffer, int max, int fd) 203: { ... 209: conn[sid].atime=time((time_t*)0); 210: while (n<max) { 211: if ((rc=recv(conn[sid].socket, buffer, 1, 0))<0) { ... Taint Object Taint Object 2007 GMU-CSA Workshop

Implementation • Source code is rewritten using CIL (C Intermediate Language) • CodeSurfer was used to extract program variables and their write sets • A commercial static analysis tool • objdump and dwarfdump were used to extract global symbol information • Dynamic Monitoring is implemented in Valgrind • An open source emulator 2007 GMU-CSA Workshop

Evaluation • Tested 11 real-world applications with known memory corruption vulnerabilities • Test cases included • Stack/Heap buffer overflow, Format string • Both control flow and non-control data attacks • Testing methodology • Programs were run under MemSherlock • Exploit programs were used to attack the applications • Log and replay was not used 2007 GMU-CSA Workshop

Evaluation Results Type abbreviations: (S)tack overflow, (H)eap overflow and (F)ormat string 2007 GMU-CSA Workshop

False Negatives • Prozilla: • memcpy uses a kernel function to manipulate page tables when copying entire pages • Valgrind cannot trace into kernel • Can be prevented by function wrappers • Other false negatives are theoretically possible • structs within unions or arrays • Current implementation does not support unions • Currently do not differentiate between elements of an array • Memory corruption errors inside DLLs 2007 GMU-CSA Workshop

False Positives • Embedded assembly • Incomplete library specification • library functions keeping internal state (e.g., strtok(Null, delim) ) • library functions that modify global variables as side effects (e.g., optarg, errno) • pointers that point to hidden global structures (e.g., getdatetime() in time.h) • struct pointers • void pointers that are type-cast to modify struct variables • since the pointer is not of type struct, MemSherlock fails to update accordingly 2007 GMU-CSA Workshop

Conclusion • Fully automated vulnerability analysis • The analysis output is intuitive and human readable • Future Challenges • Automated, long-term fix of vulnerabilities • Semantic consistency is a great challenge • Automated, temporary fix of vulnerabilities • Generating vulnerability condition • Improving signature generation 2007 GMU-CSA Workshop

Thank You

Post-Attack Analysis of Unknown Vulnerabilities