Polymorphic Malware Detection

Polymorphic Malware Detection Connor Schnaith, Taiyo Sogawa 9 April 2012

Motivation • “5000 new malware samples per day” • --David Perry of Trend Micro • Large variance between attacks • Polymorphic attacks • Perform the same function • Altered immediate values or addressing • Added extraneous instructions • Current detection methods insufficient • Signature-based matching not accurate • Behavioral-based detection requires human analysis and engineering

Malware Families •Classified into related clusters (families) •Tracking of development •Correlating information •Identifying new variants •Based on similarity of code •Koobface •Bredolab •PoisonIvy •Conficker (7 mil. Infected) Source: Carrera, Ero, and Peter Silberman. "State of Malware: Family Ties." Media.blackhat.com. 2010. Web. 7 Apr. 2012. <https://media.blackhat.com/bh-eu-10/presentations/Carrera_Silberman/BlackHat-EU-2010-Carrera-Silberman-State-of-Malware-slides.pdf>.

~300 samples of malware with 60% similarity threshold

Current Research • Techniques for identifying malicious behavior • Mining and clustering • Building behavior trees • Industry • ThreatFire and Sana Security developing behavioral-based malware detection

Design challenges • Discerning malicious portions of code • Dynamic program slicing • accounting for control flow dependencies • Reliable automation • Must be able to be reliable w/o human intervention • Minimal false positives

Holmes: Main Ideas • Two major tasks • Mining significant behaviors from a set of samples • Synthesizing an optimally discriminative specification from multiple sets of samples • Key distinction in approach • "positive" set - malicious • "negative" set - benign • Malware: fully described in the positive set, while not fully described in the negative set

Main Ideas: behavior mining • Extracts portions of the dependence graphs of programs from the positive set that correspond to behaviors that are significant to the programs’ intent. • The algorithm determines what behaviors are significant (next slide) • Can be thought of as contrasting the graphs of positive programs against the graphs of negative programs, and extracting the subgraphs that provide the best contrast.

Main ideas: behavior mining • A "behavior" is a data dependence graph • G = (V, E, a, B) • V is the set of vertices that correspond to operations (system calls) • E is the edges of the graph and correspond to dependencies between operations • a is the labeling function that associates nodes with the operations they represent • B is the labeling function that associates the edges with the logic that represents the dependencies

Main ideas: behavior mining • A program P exhibits a behavior G if it can produce an execution trace T with the following properties • Every operation in the behavior corresponds to an operation invocation and its arguments satisfy certain logical constraints • the logic formula on edges connecting behavior operations is satisfied by a corresponding pair of operation invocations in the trace • Must capture information flow in dependence graphs • two key characteristics • the path taken by the data in the program • security labels assigned to the data source and the data sink

Main ideas: behavior mining • Information gain is used to determine if a behavior is significant. A behavior that is not significant is ignored when constructing the dependency graph • Information gain is defined in terms of Shannon entropy and it means gaining additional information to increase the accuracy of determining if a G is in G+ or G- • Shannon entropy • H(G+ U G-) corresponds to the uncertainty that a graph G belongs to G+ or G- • partition G+ and G- into smaller subsets to decrease that uncertainty • process called subgraph isomorphism

Main ideas: behavior mining • A significant behavior g is a subgraph of a dependence graph in in G+ such that: Gain(G+ U G- , g) is maximized • Information gain is used as the quality measure to guide the behavior mining process • Some non-significant actions can get passed as significant • these actions may or may not throw off the algorithm that determines if the program is malicious

Main ideas: behavior mining • Significant behaviors mined from malware Ldpinch • Leaking bugfix information over the network • Adding a new entry to the system autostart list • Bypassing firewall to allow for malicious traffic • Could say any program that exhibits all three of these behaviors should be flagged malicious • This is too specific of a statement • Doesn't account for variations within a family • It is known that smaller subsets of behaviors that only include one of these actions could still be malicious • Need discriminative specifications

Main ideas: discriminative specifications • Creates clusters of behaviors that can be classified into as characteristic subset • Program matches specification if it matches all of the behaviors in a subset • "Discriminative" in that it matches the malicious but not the benign programs

Main ideas: discriminative specifications • Each set of subset of behaviors induces a cluster of samples • Malicious and benign samples are mined are organized into these clusters • Goal: find an optimal clustering technique to organize the malicious into the positive subset and the benign into negative subset

Main ideas: discriminative specifications • Three part algorithm • Formal concept analysis • Simulated annealing • Constructing optimal specifications • Formal concept analysis • O is a cluster of samples • A is the set of mined behaviors in O • A concept is the pair (A, O) Set of concepts: {c1, c2, c3 , ... , cN) Behavior specification: S(c1, c2, c3, ... , cN)

Main ideas: discriminative specifications Formal Concept Analysis (continued) • Begins by constructing all concepts and computes pairwise intersection of the intent sets of these concepts • Repeated until a fixpoint is reached and no new concepts can be constructed • When algorithm terminates, left with an explicit listing of all of the sample clusters that can be specified in terms of one or more mined behaviors • Goal is to find {c1, c2, c3, ... , cN} such that S(c1, c2, c3, ... , cN) is optimal (based on threshold)

Main ideas: discriminative specifications Simulated annealing • Probabilistic technique for finding approximate solution to global optimization problem • At each step, a candidate solution i is examined and one of its neighbors j is selected for comparison • The algorithm moves to j with some probability • A cooling parameter T is reduced throughout process and when it gets to a minimum the process stops

Main ideas: discriminative specifications Constructing Optimal Specifications • Threshold t, a set containing positive and negative samples, and a set of behaviors mined with the previous process • Called SpecSynth • Constructs full set of concepts • Removes redundant concepts • Run simulated annealing until convergence, then return the best solution

Holmes: Mining an Clustering

Evaluation and Results: Holmes • Used six malware families to develop specifications • Tested final product against 19 malware families • Collected 912 malware samples and 49 benign

Holmes Continued • Experiments carried over varying threshold values (t) • Demonstrates high sensitivity to system accuracy • Perhaps only efficient for a specific subset of malware

Holmes Scalability • Worst-case complexity is exponential • Behaviors of repeated executions (Stration and Delf) took 12-48 hours to analyze • Scalability for Holmes is a nightmare! “scary and scaled”

USENIX • The Advanced Computing Systems Association • (Unix Users Group) • 2009 article: automatic behavior matching • Behavior graphs (slices) • Tracking data and control dependencies • Matching functions • Performance evaluations Source: Kolbitsch, Clemens. "Effective and Efficient Malware Detection at the End Host." Usenix Security Symposium (2009). Web. 8 Apr. 2012. <http://www.iseclab.org/papers/usenix_sec09_slicing.pdf>.

USENIX: Producing Behavior Graphs • Instruction log • Trace instruction dependencies • Slicing doesn't reflect stack manipulation • Memory log • Access memory locations Partial behavior graph of Netsky (Kolbitsch et al)

USENIX: Behavior Slices to Functions • Use instruction and memory log to determine input arguments • Identify repeated instructions as loops • Include memory read functions • We can now compare to known malware

Evaluation Six families used for development (mostly mass-mailing worm) Expanded test set

Performance Evaluation • Installed Internet Explorer, Firefox, Thunderbird, Putty, and Notepad on Windows XP test machine • Single-core, 1.8 GHz, 1GB RAM, Pentium 4 processor

USENIX Limitations • Evading system emulator • USENIX detector uses Qemu emulator • delays • time-triggered behavior • command and control mechanisms • Modifying algorithms behavior • A more fundamental change, but cannot be detected using same signatures • End-host based system • Cannot track network activity

Questions/Discussion

Polymorphic Malware Detection

Polymorphic Malware Detection

Presentation Transcript

Windows Malware: Detection And Removal

Network-level Malware Detection

Real Time Polymorphic Shellcode Detection

Data Mining Methods for Malware Detection

Trends in Circumventing Web-Malware Detection

Analyzing Malware Detection Effectiveness with Multiple Anti-Malware Programs

Analyzing Malware Detection Efficiency with Multiple Anti-Malware Programs

Malware Detection

IMDS: Intelligent Malware Detection System

Data Mining for Malware Detection

Behavior-Based Malware Detection

Polymorphic Worm Detection by Instruction Distribution

Detection of ASCII Malware

Malware Classification And Detection

Malware detection with OSSEC

Graph Techniques for Malware Detection

Data Mining for Malware Detection

Data Mining for Malware Detection

Malware Detection in Android Applications

Malware Analysis | Malware Analysis Tools | Malware Detection Tools