Automatic Trust Management for Adaptive Survivable Systems (ATM for ASS’s) Howard Shrobe MIT AI Lab Jon Doyle MIT Lab fo

Automatic Trust ManagementforAdaptive Survivable Systems(ATM for ASS’s)Howard Shrobe MIT AI LabJon Doyle MIT Lab for Computer Science

The Core Thesis Survivable systems make careful judgments about the trustworthiness of their computational environment and they make rational resource allocation decisions based on their assessment of trustworthiness.

The Thesis In Detail: Trust Model • It is crucial to estimate to what degree and for what purposes a computational resource may be trusted. • This influences decisions about: • What tasks should be assigned to which resources. • What contingencies should be provided for, • How much effort to spend watching over the resources. • The trust estimate depends on having a model of the possible ways in which a computational resource may be compromised.

The Thesis in Detail: Adaptive Survivable Systems • The application itself must be capable of self-monitoring and diagnosis • It must know the purposes of its components • It must check that these are achieved • If these purposes are not achieved, it must localize and characterize the failure • The application itself must be capable of adaptation so that it can best achieve its purposes within the available infrastructure. • It must have more than one way to effect each critical computation • It should choose an alternative approach if the first one failed • It should make its initial choices in light of the trust model

The Thesis in Detail: Rational Resource Allocation • This depends on the ability of the application, monitoring, and control systems to engage in rational decision making about what resources they should use to achieve the best balance of expected benefit to risk. • The amount of resources dedicated to monitoring should vary with the threat level • The methods used to achieve computational goals and the location of the computations should vary with the threat • Somewhat compromised systems will sometimes have to be used to achieve a goal • Sometimes doing nothing will be the best choice

The Active Trust Management Architecture Self Adaptive Survivable Systems Trust Model: Trustworthiness Compromises Attacks Perpetual Analytical Monitoring Rational Decision Making Trend Templates System Models & Domain Architecture Other Information Sources: Intrusion Detectors Rational Resource Allocation

The Nature of a Trust Model • Trust is a continuous, probabilistic notion • All computational resources must be considered suspect to some degree. • Trust is a dynamic notion • the degree of trustworthiness may change with further compromises • the degree of trustworthiness may change with efforts at amelioration • The degree of trustworthiness may depend on the political situation and the motivation of a potential attacker • Trust is a multidimensional notion • A System may be trusted to deliver a message which not being trusted to preserve its privacy. • A system may be unsafe for one user but relatively safe for another. • The Trust Model must occupy at least Three tiers • The Trustworthiness of each resource for each specific purpose • The nature of the compromise to the resources • The nature of the attacks on the resources • Most work has only looked at attacks (intrusion detection).

Tiers of a Trust Model • Attack Level: history of “bad” behaviors • penetration, denial of service, unusual access, Flooding • Compromise Level: state of mechanisms that provide: • Privacy: stolen passwords, stolen data, packet snooping • Integrity: parasitized, changed data, changed code • Authentication: changed keys, stolen keys • Non-repudiation: compromised keys, compromised algorithms • QoS: slow execution • Command and Control Properties: compromises to the monitoring infrastructure • Trust Level: degree of confidence in key properties • Compromise states • Intent of attackers • Political situation

Component Asset Base Foo Foo B 1 2 3 A A B 1 2 3 C 1 2 3 Self Monitoring B A Condition-1 Condition-1 Adaptive Survivable Systems Diagnosis & Recovery Rational Selection Diagnostic Service To: Execute Foo Method 3 Is most Attractive 1 2 3 Repair Plan Selector Super routines alerts Layer1 Layer2 Rollback Designer Layer3 Synthesized Sentinels Plan Structures Resource Allocator Post Condition 1 of Foo Because Post Cond 2 of B And Post Cond 1 of C PreReq 1 of B Because Post Cond 1 of A Enactment Development Environment Runtime Environment

How to Build Adaptive Survivable Systems • Make Systems Fundamentally dynamic • Systems should have more than one way to achieve any goal • Always allow choices to be made (or revised) late • Inform the runtime environment with design information • Systems should know the purposes of their component computations • Systems should know the quality of different methods for the same goal • Make the System responsible for achieving its goals • Systems should diagnose the failure to achieve intended goals • Systems should select alternative techniques in the event of failure • Build a trust model by pervasive monitoring and temporal analysis • Optimize dynamically in light of the trust model • balance between quality of goal achieved vs risk encountered

Dynamic Rational Component Selection • Systems have more than one method for each task. • Each method specifies • Quality of Service provided • Resources Consumed • Likelihood of success • Likelihood of success is updated to reflect current state of trust model • Select the method with greatest Expected Net Benefit • Generalizes “Method Dispatch” Replace notion of “Most Specific Method” By that of “Most Beneficial Method”

Context for the Project: The Intelligent Room (E21) • The Intelligent Room is an Integrated Environment for Multi-modal HCI. It has Eyes and Ears. • The room provides speech input • The room has deep understanding of natural language utterances • The room has a variety of machine vision systems that enable it to: • Track motion and maintain the position of people • Recognize gestures • Recognize body postures • Identify faces (eventually) • Track pointing devices (e.g. laser pointer) • Select Optimal Camera for Remote Viewers • Steer Cameras to track focus of attention • MetaGlue is a lightweight, distributed agent infrastructure for integrating and dynamically (un)connecting new HCI components. Meta Glue is the Brains of the Room.

The E21 Maps Abstract Services into Plans • Users request abstract services from the E21 • “I want to get a textual message to a system wizard” • The E21 has many plans for how to render each abstract service • “Locate a wizard, project on a wall near her” • “Locate a wizard, use a voice synthesizer and a speaker near her” • “Print the message and page the Wizards to go to the printer” • Each plan requires certain resources (and other abstract services) • Some resources are more valuable than others (higher cost) • Some resources are more useful for this plan than others (higher benefit) • The resources may be otherwise committed • They may be preempted (but at a high cost) • The resource manager picks a set of resources which is (nearly) optimal

Each Method Requires Different Resources Each Method Binds the Settings of The Control Parameters in a Different Way Resource1,1 Service Control Parameters Resource Cost Function Method1 Resource1,2 User’s Utility Function Abstract Service Method2 Resource1,j The binding of parameters has a value to the user Methodn User Requests A Service with certain parameters The Resources Used by the Method Have a cost Each Service can be Provided by Several Methods Net Benefit The System Selects the Method Which Maximizes Net Benefit

Recovering From Failures • The E21 renders services by translating them into plans involving physical resources • Physical resources have know failure modes • Each plan step accomplishes sub-goal conditions needed by succeeding steps • Each condition has some way of monitoring whether it has been accomplished • These monitoring steps are also inserted into the plan • If a sub-goal fails to be accomplished, model-based diagnosis isolates and characterizes the failure • A recovery is chosen based on the diagnosis • It might be as simple as “try it again”, we had a network glitch • It might be “try it again, but with a different selection of resources” • It might be as complex as “clean up and try a different plan”

Access Policies Each Method Requires Different Resources Each Method Binds the Settings of The Control Parameters in a Different Way Resource1,1 Service Control Parameters Resource Cost Function Method1 Resource1,2 User’s Utility Function Abstract Service Method2 Resource1,j The binding of parameters has a value to the user Methodn User Requests A Service with certain parameters The Resources Used by the Method Have a cost Each Service can be Provided by Several Methods Net Benefit Access Policies Naturally Fit Within the Model

Model Based Troubleshooting For Trust Model Updating

Model Based Diagnosis for Survivable Systems • Extension of previous work on Model-based Diagnosis • Shrobe & Davis, Williams and deKleer • Focus is on diagnosing failure of Computations in order to assess the health of the underlying resources • Given: • Plan Structure of the Computation describing expected behavior including QoS • Observation of actual behavior that deviates from expectations • Produce: • Localization: which component(s) failed • Characterization: what did they do wrong • Inferences about the compromise state of the computational resources involved. • Inferences about what attacks enabled the compromise to occur

Ontology of the Diagnostic Task • Computations utilize a set of resources (e.g. the computation uses hosts, binary executable file, databases etc.) • Individual resources have vulnerabilities • Vulnerabilities enable attacks • An attack on a instance of a particular type of resource can cause that resource to enter a compromised state • A computation that utilizes a compromised resource may exhibit a misbehavior, I.e. it may behave in a manner other than would be predicted by its design. • Misbehaviors are the symptoms which initiate diagnostic activity, leading to updated assessments of: • The compromised states of the resources used in the computation • The likelihood of attacks having succeeded • The likelihood that other resources have been compromised

Structural Model/Pattern Statistical Profile The Space of Intrusion Detection UNSUPERVISED LEARNING FROM NORMAL RUNS Model of Expected Behavior. Discrepancy from Good Symptom Anomaly Suspicious Violation Match to Bad HANDCODED STRUCTURAL MODELS OF ATTACKS SUPERVISED LEARNING FROM ATTACK RUNS A symptom may indicate an attack or a compromise

25 20 10 Consistent Diagnosis: Broken takes inputs 25 and 15 Produces Output 35 Consistent Diagnosis: Broken takes inputs 5 and 3 Produces Output 10 No Consistent Diagnosis: Conflict between 25 & 20 Model Based TroubleshootingConstraint Suspension 15 Times 3 Plus 40 40 5 Times 5 25 Plus 40 5 35 Times 15 3

Multiple Faults and theGeneral Diagnostic Engine (GDE) • Each component is modeled by multi-directional constraints representing the normal behavior • As a value is propagated through a component model, it is labeled with the assumption that this component works • The propagated label is the set union of the labels of the inputs to the model plus a token for the current model • A conflict is detected at any place to which inconsistent values are propagated • It’s inconsistent to believe two inconsistent values at once • The union of the labels of these values imply that you should believe both • At least one element in this union must be false. • A Nogood is the set union of the labels of the conflicting values. • A diagnosis is a set of assumptions which form a covering set of all Nogoods (i.e. includes at least 1 assumption in each nogood) • Goal is to find all minimum diagnoses

15 25 25 20 15 Model Based TroubleshootingGDE Times 3 Plus 40 40 5 Times 5 5 Plus 40 35 Times 3 Conflicts: Blue or Violet Broken Diagnoses: Green Broken, Red with compensating fault Green Broken, Yellow with masking fault

Time:3,5 Component2 Time:3,7 Time:9,15 Component1 Delay:2,4 Component4 Time:9,17 Time:1,3 Observed Time:27 Delay:1,3 Delay:5,10 Time:1,1 Time:9,17 Component3 Time:4,7 Time:4,5 Delay:3,4 Component5 Time:5,9 Conflicts: Observed Time:6 Delay:1,2 Diagnoses: Applying MBT to QoS Issues Time:0 Blue broken Violet broken Red broken,Yellow broken Red broken, Green broken Broken How?! Green broken, Yellow broken

Adding Failure Models • In addition to modeling the normal behavior of each component, we can provide models of known abnormal behavior • Each Model can have an associated probability • A “leak Model” covering unknown failures/compromises covers residual probabilities. • Diagnostic task becomes, finding most likely set(s) of models (one model for each component) consistent with the observations. • Search process is best first search with joint probability as the metric Delay:2,4 Component2 Normal: Delay: 2, 4 Probability 90% Delayed: Delay 4, +inf Probability 9% Accelerated: Delay -inf,4 Probability 1%

Applying Failure Models Observed: 5 Predicted: Low = 5 High = 10 B L H P Normal: 2 4 0.9 Fast: -30 1 .04 Slow: 5 30 .06 A OUT1 MID Low = 3 High = 6 L H P Normal: 3 6 .7 Fast: -30 2 .1 Slow: 7 30 .2 0 IN L H P Normal: 5 10 0.8 Fast: -30 4 .03 Slow: 11 30 .07 OUT2 Observed: 17 Predicted: Low = 8 High =16 C Consistent Diagnoses ABC MID MID Prob Explanation Low High NormalNormalSlow 3 3 .04410 C is delayed SlowFastNormal 7 12 .00640 A Slow, B Masks runs negative! FastNormalSlow 1 2 .00630 A Fast, C Slower Normal FastSlow 4 6 .00196 B not too fast, C slow FastSlowSlow -30 0 .00042 A Fast, B Masks, C slow SlowFastFast 13 30 .00024 A Slow, B Masks, C not masking fast

Modeling Underlying Resources • The model can be augmented with another level of detail showing the dependence of computations on resources • Each resource has models of its state of compromise • They can be abstract • node has cycle stealing, • network segment is being overloaded • The modes of the resource models imply the modes of the computational models • E.g. if a computation resides on a node which is losing cycles, then the computation model must be the retarded model. Normal: Delay: 2,4 Delayed: Delay 4,+inf Accelerated: Delay -inf,2 Normal: Probability 90% Parasite: Probability 9% Other: Probability 1% Has models Has models Node17 Component 1 Located On

Conditional probability = .2 Normal: Delay: 2,4 Delayed: Delay 4,+inf Accelerated: Delay -inf,2 Normal: Probability 90% Parasite: Probability 9% Other: Probability 1% Conditional probability = .4 Conditional probability = .3 Has models Has models Node17 Component 1 Located On Moving to a Bayesian Framework • The model has levels of detail specifying computations, the underlying resources and the mapping of computations to resources • Each resource has models of its state of compromise • The modes of the resource models are linked to the modes of the computational models by conditional probabilities • The Model forms a Bayesian Network

Time:3,5 Component2 Time:3,7 Time:9,15 Component1 Delay:2,4 Component4 Time:9,17 Time:1,3 Observed Time:27 Delay:1,3 Delay:5,10 Time:1,1 Time:9,17 Component3 Time:4,7 Time:4,5 Delay:3,4 Component5 Time:5,9 Conflicts: Observed Time:6 Delay:1,2 Diagnoses: Computational Models are Coupled through Resource Models Node1 Node2 Delay:1,3 Blue delayed Violet delayed Red delayed, Yellow Negative Time Precluded because physicality requires red green and yellow to all be delayed or all be accelerated Red delayed, Green Negative Time Green delayed, Yellow Negative Time

An Example System Description N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 A B C N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 D E Host1 Host2 Host3 Host4 Normal .9 Hacked .1 Normal .85 Hacked .15 Normal .7 Hacked .3 Normal .8 Hacked .2

Quake Burglar Alarm No Alarm T T .97 .03 T F .65 .35 F T .55 .45 F F .03 .97 Alarm Earth quake Burglar Bayesian Networks • Bayesian Networks are a technique for representing complex problems involving evidential reasoning • Reduces the need to state an exponential number of conditional probabilities • Model involves nodes and links • Nodes represent statistical variables • Links represent conditional dependence between variables (I.e. causation) • Links not present represent independence • Bayesian Solvers compute joint probability of some nodes given the probability (or observation) of others.

N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 A B C N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 D E Host1 Host2 Host3 Host4 Normal .9 Hacked .1 Normal .85 Hacked .15 Normal .7 Hacked .3 Normal .8 Hacked .2 System Description as a Bayesian Network • The Model can be viewed as a Two-Tiered Bayesian Network • Resources with modes • Computations with modes • Conditional probabilities linking the modes

N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 A B C N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 D E System Description as a MBT Model • The Model can also be viewed as a MBT model with multiple models per device • Each model has behavioral description • Except the models have conditional probabilities

N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 Discrepancy Observed Here A B C Conflict: A = NORMAL B = NORMAL C = NORMAL N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 Least Likely Member of Conflict Most Likely Alternative is SLOW D E Integrating MBT and Bayesian Reasoning • Start with each behavioral model in the “normal” state • Repeat: Check for Consistency of the current model • If inconsistent, • Add a new node to the Bayesian network • This node represents the logical-and of the nodes in the conflict. • It’s truth-value is pinned at FALSE. • Prune out all possible solutions which are a super-set of the conflict set. • Pick another set of models from the remaining solutions • If consistent, Add to the set of possible diagnoses • Continue until all inconsistent sets of models are found • Solve the Bayesian network

NoGood1 Conflict: A = NORMAL B = NORMAL C = NORMAL N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 A B C N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 D E Host1 Host2 Host3 Host4 Normal .9 Hacked .1 Normal .85 Hacked .15 Normal .7 Hacked .3 Normal .8 Hacked .2 Adding the Conflict to the Bayesian Network Truth Value =False Conditional Probability Table A=N Br=N C=N T F T T T 1 0 T T F 0 1 T F T 0 1 T F F 0 1 F T T 0 1 F T F 0 1 F F T 0 1 F F F 0 1

Integrating MBT and Bayesian Reasoning (2) • Repeat Finding all conflicts and adding them to the Bayesian Net. • Solve the network again. • The posterior probabilities of the underlying resource models tell you how likely each model is. • These probabilities should inform the trust-model and lead to Updated Priors and guide resource selection. • The Posterior probabilities of the computational models tell you how likely each model is. This should guide recovery. • All remaining non-conflicting combination of models are possible diagnoses • Create a conjunction node for each possible diagnosis and add the new node to the Bayesian Network (call this a diagnosis node) • Finding most likely diagnoses: • Bias selection of next component model by current model probabilities

The Final Bayesian Network Value =False Value =False NoGood1 NoGood2 Conflict: A = NORMAL B = NORMAL C = SLOW Conflict: A = NORMAL B = NORMAL C = NORMAL Off-Peak .028 Peak .541 Normal .432 Slow .590 Fast .000 Normal .410 Slow .738 Normal .262 A B C A = SLOW B = SLOW C = NORMAL D = NORMAL E = PEAK Slow .612 Fast .065 Normal .323 Slower .516 Slow .339 Normal .145 D E Diagnosis-1 Host1 Host2 Host3 Host4 Diagnosis-50 Hacked=.324 Normal = .676 Hacked=.207 Normal = .793 Hacked=.267 Normal = .733 Hacked=.450 Normal = .550

Final Model Probabilities Hacked Hacked Hacked Normal Normal Resource Posterior Prior Posterior Prior Host1 .324 .300 .676 .700 Host2 .207 .200 .793 .800 Host3 .450 .150 .550 .850 Host4 .267 .100 .733 .900 Computation Mode Probability A Off-Peak .028 Peak .541 Normal .432 B Slow .738 Normal .262 C Slower .516 Slow .339 Normal .145 D Slow .590 Fast .000 Normal .410 E Slow .612 Fast .065 Normal .323

Adding Attack Models • An Attack Model specifies the set of attacks that are believed to be possible in the environment • Each resource has a set of vulnerabilities • Vulnerabilities enable attacks on that resource • We map attacks x resource-type to behavioral modes of the resource • This is given as a set of conditional probabilities • If this attack succeeded on a resource of this type then the likelihood that the resource is in mode-x is P • This now forms a three tier Bayesian network Has- vulerability Enables Host1 Buffer-Overflow Overflow-Attack Resource-type Normal .5 Causes Unix-Family .7 Slow

Three Tiered Model

Example Final Data

Effect of Attack Model

Summary • Diagnostic process goes from observations of computational behavior to underlying trust model assessments • Three tiered model: • Vulnerabilities and Attacks • Compromised States of Resources • Non-Standard behavior of computation • New synthesis of Bayesian and Model-Based reasoning • Next Steps • Realistic ontology of attacks, compromise states, etc • Resource selection in light of diagnosis • Challenges: • Realistic Attacks Models may swamp Bayesian net computation • How to handle unknown attacks

Automatic Trust Management for Adaptive Survivable Systems (ATM for ASS’s) Howard Shrobe MIT AI Lab Jon Doyle MIT Lab fo

Automatic Trust Management for Adaptive Survivable Systems (ATM for ASS’s) Howard Shrobe MIT AI Lab Jon Doyle MIT Lab fo

Presentation Transcript

Automatic Synthesis of Fault-Tolerance

Trust and Reputation System

D-STAR System D-mystified

Hydronic Balancing

Trust and Corruption in Transition Eric M. Uslaner

Adaptive Educational Environments for Cognitive Skills Acquisition

Automatic Transmission Fundamentals

Automatic Forecasting with R

TR-69 CPE WAN Management Protocol

The Dell KACE Systems Management Appliance Advantage

Strategic Formulation

Principles of Adaptive Thermal Comfort

An Introduction to Adaptive Learning

Physiological Variability

CSA3212: User-Adaptive Systems

Development of Rigorous Adaptive Information Systems

Adaptive Hypermedia From Concepts to Authoring

CSA4080: Adaptive Hypertext Systems II

Introduction to the Common Criteria and the Underlying Concepts of Trust in Computer Systems

PATTERNS FOR AUTOMATIC INTEGRATION OF THE DOMAIN-DATA LAYERS IN ENTERPRISE-SYSTEMS

Utilizing DeltaV Adaptive Control

CSA3212: User-Adaptive Systems