240 likes | 353 Vues
Selected Topics in Automated Diversity. Stephanie Forrest University of New Mexico. Mike Reiter Dawn Song Carnegie Mellon University. Automated Diversity for Security. Computer systems are highly uniform Easy targets for standardized attacks. Use idea of biological diversity:
E N D
Selected Topics in Automated Diversity Stephanie Forrest University of New Mexico Mike Reiter Dawn Song Carnegie Mellon University
Automated Diversity for Security • Computer systems are highly uniform • Easy targets for standardized attacks. • Use idea of biological diversity: • Introduce changes that make each system unique • Attack will need to be rewritten for each computer • Provide population resilience to unknown environmental threats • Two approaches: • Interface diversity: Adapt vulnerable interfaces such as machine language, system call numbers, and standard library locations. • Implementation diversity: Utilize diverse implementations of common services • Two projects: • Randomized instruction set emulation [Barrantes, Ackley and Forrest] • Behavioral distance for anomaly detection [Gao, Reiter and Song]
Randomized Instruction Set Emulation (RISE)An example of interface diversity • Many current attacks insert binary code into a running program which is then executed. • RISE protects the code itself, rather than points-of-entry: • Perimeter defense (e.g., stack protection) not enough. • Randomize binary code instruction set for every program: • Foreign malicious code will try to execute code in the standard format and will fail. • Knowledge of a particular translation will gain access only to that particularprogram. • Modify compiler/virtual machine to accept this “new” language: • Prototype in open-source binary-to-binary translator Valgrind. • Related to encrypting compilers.
Results • Prototype implementation available under GPL from http://www.cs.unm.edu/~immsec: • Normal code runs properly. • Binary code injection attacks stopped (100% of tested examples). • Performance (preliminary): • Emulation overhead of Valgrind is high. • Incremental cost of RISE is small. • (Very) roughly a factor of 2 slowdown in current configuration. • Significant space penalty: • Libraries • Mask
Host-Based Anomaly Detector Is this system call request anomalous? Model Anomalous? (Y/N) User Space Kernel Space 3 5 11 Research Focus: What is the best model for anomaly detection? Can we use another computer as the model?
Fault-Tolerant System • Commercial Off-the-shelf applications: may not produce the same responses • Intrusions that do not result in observable deviation in the responses • Need to observe the behavior
The Problem • Diverse Platform (Linux and Windows) • System call numbers observed do not have semantic meanings • System calls may not have one-to-one correspondence • System call sequences may have different length • Diverse Implementation (Apache and Abyss) • Correspondence may not exist between individual system calls Match? 3 43 5 3 4 9 6 302 10 46 6 222
Evolutionary Distance • Are two DNA sequences derived from a common ancestral sequence? • Evolutionary distance between two DNA sequences • Substitutions • Deletions • Insertions Insertion/Deletion (I/D) Symbols ATGCGTCGTT ATCCGCGAT ATGC-GTCGTT AT-CCG-CGAT
Behavioral Distance and Evolutionary Distance • Similarities • Evaluate difference between two sequences • Substitutions, Deletions and Insertions • Differences • Same system call number in two sequences are not the “same” • We do not have the cost table in behavioral distance measure • We have training data
Behavioral Distance • Behavioral distance calculation • Learning the cost table • Initializing the cost table • Iteratively updating the cost table • System call phrase extraction
Behavioral Distance Calculation ATGCGTCGTT ATCCGCGAT ATGC-GTCGTT AT-CCG-CGAT The set of sequences obtained by inserting n-len(s) I/D symbols into s, at any location
Learning the Cost Table • Training data: subjecting the replicas to a battery of well-formed (benign) requests and observing the system calls induced • Initializing the cost table • The first approach: comparing semantics of individual system calls • The second approach: using frequency information • Iteratively updating the cost table • Use the initialized cost table to calculate behavioral distance between system call sequences in the training data • Results of the behavioral distance reveal the “proper alignments” between system calls • Use these “proper alignments” to update the cost table
System call Phrases • Correspondence may not exist between individual system calls • Behavioral distance calculation is very slow when sequences are long • Solution: group system calls into system call phrases • System call phrases are also called system call subsequences • A system call phrase is a sequence of system calls that frequently appear together in program execution • TEIRESIAS algorithm (also taken from Biology) • TEIRESIAS algorithm has been used in other intrusion/anomaly detection systems
Behavioral Distance – Same Application Apache Webserver Myserver Webserver
Behavioral Distance – Different Application Linux: Myserver Webserver Windows: Apache Webserver Linux: Apache Webserver Windows: Myserver Webserver
Behavioral Distance – Mimicry Attacks True acceptance rate when threshold is set to detect the best mimicry attack Behavioral distance of the best mimicry attack Attacker knows individual IDS on one replica Attack knows behavioral distance and the cost table
Conclusion • Behavioral distance detects an attack on one process that causes its behavior to deviate from that of another • Behavioral distance makes evasion attacks more difficult with moderate overhead