Addressing Threats Against DoD Networked Systems Using Virtualization

Addressing Threats Against DoD Networked Systems Using Virtualization Anup Ghosh & Sushil Jajodia Center for Secure Information Systems George Mason University and Peng Liu Penn State University and Angelos Keromytis, Sal Stolfo, Jason Nieh Columbia University

Autonomic Recovery of Enterprise-wide Systems After Attack or Failure with Forward CorrectionAnup Ghosh, Sushil Jajodia: {aghosh1,jajodia}@gmu.edu; Angelos Kerymidas, Sal Stolfo, Jason Nieh: {angelos,sal,nieh}@cs.columbia.edu; Peng Liu :pliu@ist.psu.edu • Objective • Develop self-regenerative enterprise networks that recover and re-constitute themselves after attacks and failures • Develop a transaction-based model for commodity operating systems to determine where an attack occurred, what data or programs were altered, and back-out all these changes without affecting unrelated data/activities. • Automatically generate patches to make systems more robust after attack. • Technical Approach: • Develop a layered approach to self-regenerative systems: • application-level resilience using error virtualization and rescue points • system-level resilience using virtualization and transaction semantics for programs to roll back system state to the last known good continuation point • dynamic patching of applications to improve resiliency after attack • roll forward with correction to quarantine tainted processes and files & back-out changes • DoD Benefit: • Uninterruptible service for critical network centric warfare services • Error localization and tolerance in applications • Automatic system recovery after attack including quarantine of tainted processes and data • Increased resiliency after attack through auto-patch generation Budget: Planned/Actual $K Dates and location of Major Reviews/Meetings: July 10, 2008, UVA Northern VA Center, Falls Church, VA

Uninterruptible Server Developed an architecture, algorithms, and system for providing uninterruptible critical network services in the face of attack Breakthroughs: Supports use of COTS buggy software while still providing 100% availability Experimental results show resilience against classes of malicious attack including denial of service, worms, and stealthy Trojans Experimentally-verified low-overhead Eliminates false negatives from sensors, and automatically handles false positives without manual review Architecture for uninterruptible servers Sensors Actuators TC State Estimator Response Selector Health status monitor for virtual machines and uninterruptible server Technical Breakthroughs & Accomplishments (1 of 3)

Self-Healing Systems Developed an approach for a self-recoverable Linux file system Developed Self-Healing PostgreSQL, a damage tracking, quarantine, and repair DBMS The first COTS DBMS that satisfies two essential enterprise health requirements: Near-zero-run-time overhead: less than 8% Zero-system-down-time: during online repair, its throughput degradation quickly improves from 40% to 10-20% within few seconds Technical Breakthroughs & Accomplishments (2 of 3)

Application Recovery Through Error Virtualization Developed novel “error virtualization with rescue points” recovery technique retrofit exception-handling capabilities in vulnerable code allows for safe and efficient application recovery from failures and attacks Evaluated recovery mechanism with 6 open-source apps 90%+ success Technical Breakthroughs & Accomplishments (3 of 3)

Diversify and replicate servers in virtual machines Create a trustworthy controller (TC) that uses automatic feedback control to control state of servers Hide details of server replication from clients Revert servers to pristine condition on attack or corruption while continuing to provide service VS VS SensorReports Action VSH VSH Action Recommendation TC LoadBalancer Action decisions VSH VSH VS VS Solution for Non-Stop Computer Servers

TC Testbed Setup Apache00 TC GUI Station Apache01 TC Control Station Client Apache02 LoadBalancer Server 192.168.0/24 10.0.0.0/16

TC GUI: System View

Withstanding Persistent DoS Attacks 1 attack per second  92% of normal throughput 8 attacks per second  60% 8/14/2014 9

Revert Overhead 8 measurements took one minute Worst case revert overhead = 12% (when reversion starts) Return to 99% of normal throughputs in 30 sec (measure 5). 8/14/2014 10

Non-stop Server Summary • TC is a close-loop control architecture for intrusion detection and server defense • Servers are virtualized so that they can be reverted to pristine state at low cost. • The control loop issues actuators in response to sensor inputs • Handles “false negatives,” including zero-day exploits and ingenious stealthy attacks that evade detection. • Handles false alarms automatically without human in the control loop. • Address the problem of overwhelming “false positives.”

Next Steps: Journal Computing System We are developing a journal computing system (JCS) to “transactionalize” operations between application processes and the system, including: File system Network Other process/memory transactions The journal is a highly condensed version of events that happen on the system to allow traceability and restoration A transaction in the journal characterizes the collective effects of related system activities using summarization

Files Database PSU: Server Room Machine Healthcare Internet Problem 1: data cleaning in corrupted databases Attacks Server Room Internet services (e.g., httpd) Problem 2: repairing corrupted files Problem 3: service process state disinfection DBMS service Application service processes Problem 4: service running environment disinfection

P1 Solution: Self-healing PostgreSQL Architecture: Experiment results: * 0.58ms (8%) runtime overhead per trans. * Save the work of 35K legitimate trans. * 80K records are cleaned in 20s with ~20% throughput degradation System: fully implemented * Tag-based dynamic damage tracking * Multi-version based online recovery * Fine-grained quarantine Testbed: * TPC-C benchmark * Clinical OLTP

P2 Solution: Coordinated Process State and File Disinfection (1) • CPFD Approach: • Process-file-process dependencies propagate infection • P1 is intruded at time B; P2 is infected at time E • System call on file X at time C is clean; but X is corrupted on time F • Rollback P1 to checkpoint B; rollback P2 to checkpoint E • Rollback the file system to checkpoint A • System call C will be replayed, but system call F cannot be

P2 Solution: Coordinated Process State and File Deinfection (2) Architecture: Evaluation: * A demo scenario * No results yet System: implemented * VM protection: via User Mode Linux * System call interception * System call level taint analysis * Dynamic process/file quarantine * Asynchronized checkpointing * Online recovery * Process state checkpoint & rollback through UML

PSU: Next Steps 1: Evaluate the CPFD system (coordinated process state and file deinfection) 2: Fully implement the P3 solution (a preliminary P3 system is built atop QEMU) 3: Make P3 solution (process state disinfection) a real-time healthcare service 4: Solve P4: service running environment deinfection

Error Virtualization Using Rescue Points • Recover using program’s code • Mapping between set of faults that could occur and those explicitly handled by the program code • Profile programs during “bad” test runs • Build behavioral model • Discover candidate recovery(rescue) points • Induce faults at locations that are known (or suspected) to propagate faults correctly • Work on binaries (COTS), multi-process applications

High Level Approach

Experimental Testbed

Experimental Results • Simulate load by running standard benchmark • Inject fault • Using existing or specially crafted exploit code • Test: survivability, correctness, performance

Performance

Next Steps • Recovery in multi-process environments needs improvement • Client-side perception • Deterministic replay component needs performance improvements for multi-process applications • Integration with Database recovery techniques • Recovery Shepherding • Cannot guarantee program path on recovery • Could bypass security checks (e.g. sshd) • Automatically verify that error virtualization leads to steady state (beginning of server loop)

Discussion aghosh1@gmu.edu

Addressing Threats Against DoD Networked Systems Using Virtualization

Addressing Threats Against DoD Networked Systems Using Virtualization

Presentation Transcript

... Networked Healthcare Systems ...

Bomb Threats Against Schools

Managing Data Against Insider Threats

Latest Threats Against Mobile Devices

Networked Robotic Systems

Networked Embedded Systems

Complex networked systems

NETWORKED EMBEDDED SYSTEMS

Live Updating Operating Systems Using Virtualization

Strengthening Safeguards Against Familiarity Threats

Protecting against information threats

Predictable Design of Embedded Systems using Networked Architectures

Day 3: Using Process Virtualization Systems

Networked Control Systems

Defense Against Web Threats

Threats using ICT.

NETS3303 Networked Systems

... Networked Healthcare Systems ...

Networked Information Systems

THREATS AGAINST THE CHURCH

Networked Control Systems

Networked Systems Security