1 / 45

Structured Models for Decision Making

Structured Models for Decision Making. Daphne Koller Stanford University koller@cs.Stanford.edu. MURI Program on Decision Making under Uncertainty July 18, 2000. Roadmap. Bayes Nets. PRMs. Static. Encapsulation Reuse. Dynamic PRMs. DBNs. Dynamic. Encapsulation Approximation.

jeneva
Télécharger la présentation

Structured Models for Decision Making

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structured Models forDecision Making Daphne Koller Stanford University koller@cs.Stanford.edu MURI Program on Decision Making under Uncertainty July 18, 2000

  2. Roadmap Bayes Nets PRMs Static Encapsulation Reuse Dynamic PRMs DBNs Dynamic Encapsulation Approximation Relational MDPs Factored MDPs Decision Problem Factored Policy Iteration, Efficient PRM inference

  3. Outline • Probabilistic Relational Models • Representing complex domains • Structural uncertainty • Temporal models • Decision making

  4. Basic units of knowledge entities properties relations attributes

  5. BNs are not suitable for representing complex, structured, flexible domains. So what? • Set of entities and relations between them is determined at BN design time • structure must be known in advance • hard to adapt to changes • BNs for complex domains are large & unstructured •  very hard to build • No ability to generalize • across “similar” individuals • across related situations

  6. Probabilistic Relational Models • Combine advantages of predicate logic & BNs: • natural domain modeling: objects, properties, relations; • generalization over a variety of situations; • compact, natural probability models. • Integrate uncertainty with relational model: • properties of domain entities can depend on properties of related entities; • uncertainty over relational structure of domain.

  7. Example object classes: Battalion Battery Vehicle Location Weather. Example relations: At-Location Has-Weather Sub-battery/In-battalion Sub-vehicle/In-battery Real-World Case Study Battlefield situation assessment for missile units • several locations • many units • each has detailed model

  8. At-Location Status Report Scud Battery: Simplified PRM Under Fire Launcher #(Launcher.status = ok) Next Mission

  9. SCUD Battery Model

  10. Cargo Vehicle Group

  11. Original BN*: SCUD Battery Disadvantages • A lot more complex • must include relevant attributes of related objects • Hard to transfer information between different BN models *Built by IET, Inc.

  12. Angel Island Alcatraz 3rd Scud Battalion 17th Scud Battalion Scud Battery 1 Scud Battery 2 Scud Battery 3 Launcher 1 Situation Models • Complex situations can be described compactly by specifying objects and relations between them • Class model is instantiated for each object, with probabilistic dependencies induced by relations

  13. Example reasoning pattern Scud-Battalion-Charlie under_fire under_fire heavy 0.06 0.44 0.28 0.33 Battery1 hit hit Group-TLs Loc TL1 TL2 damaged good damaged hide-support hide-support rep_damaged rep_damaged #reported_damaged #reported_damaged none

  14. Attack Angel Island Alcatraz Under Fire 3rd Scud Btn 17th Scud Btn B1.Launch B2.Launch Scud Bty 1 Scud Bty 2 Scud Bty 3 B1.L1.Damaged B2.L1.Damaged B1.L2.Damaged B2.L2.Damaged Launcher 1 B2.L1.Report B1.L1.Report B2.L2.Report B1.L2.Report B1.Success B2.Success Inference in PRMs + PRM Situation description Induces BN over attributes

  15. Exploit Structure for Inference • Encapsulation: objects interact in limited ways • Inference can be encapsulated within objects, with “communication” limited to interfaces • Reuse:objects from same class have same model • Inference from one can be reused for others

  16. Effects of exploiting structure 6000 flat BN no reuse with reuse 5000 4000 running time in seconds 3000 2000 1000 0 1 2 3 4 5 6 7 8 9 10 #vehicles of each type / battery

  17. Extension: Structural Uncertainty • Uncertainty about model structure: • Set of objects: is that radar signal from a tank • Relations between objects: location of SCUD-Battalion-C • Task 1: Seamless integration w. probabilistic model • structural variables can depend on other variables. • Task 2: Efficient Inference • Use approximate inference to simplify model • variational methods to summarize multiple potential influences • MCMC for traversing possible relationships • Use structured inference (encapsulation/reuse) on simplified model

  18. Outline • Probabilistic Relational Models • Temporal models • Structured belief-state tracking • Dynamic PRMs: time, events and actions • Decision making

  19. Dynamic Bayesian Nets Action(t+2) Action(t) Action(t+1) • Compact representation of system dynamics • discrete, continuous, hybrid • Generalization of Kalman filters ... Velocity(t+2) Velocity(t) Velocity(t+1) Position(t+2) Position(t) Position(t+1) Observed_pos(t) Observed_pos(t+1) Observed_pos(t+2)

  20. Observed_pos(t) Observed_pos(t+1) Tracking System State Task: Maintain Belief state— distribution over current state given evidence so far Action(t+2) Action(t) Action(t+1) ... Velocity(t+2) Velocity(t) Velocity(t+1) Position(t+2) Position(t) Position(t+1) • In discrete/hybrid systems, belief state representation is exponential in # of state variables • In hybrid systems, # of distinct hypotheses grows exponentially over time

  21. H i X D i i True False 0.7 0.3 Approximate Tracking • Decompose belief state along “subsystem lines” • Maintain belief state as product of marginals • In hybrid systems, keep mixture of hypotheses for every subsystem • Merge hypotheses associated with similar density

  22. Case Study: Diagnosis & Tracking for Five-Tank System F1o F23 F5o • State space per time slice • eleven-dimensional continuous space • 227 discrete failure modes observables

  23. 2 burst C12 1.5 Neg drift burst C45 1 Neg drift C23 0.5 burst 0 Measurement errors: F23, F5o 0 5 10 15 20 25 30 35 40 45 50 The doomsday scenario

  24. Algorithm Performance 2 1.8 C12 1.6 1.4 P5 1.2 C45 1 0.8 0.6 0.4 0 5 10 15 20 25 30 35 40 45 50 2 1.8 C12 1.6 1.4 P5 1.2 C45 1 0.8 0.6 0.4 0 5 10 15 20 25 30 35 40 45 50 Omniscient Kalman Filter

  25. Dynamic PRMs • Goal: Model complex structured systems • that evolve over time • where agents take compound structured actions & construct effective scalable inference algorithm • Easy part: Add time relation to PRMs • Allows notion of current and previous state • Maintains notions of structured objects and relations • Challenges: • Appropriate representation for actions, events • Modeling changes in domain structure (objects, relations) • Effective inference that exploits structure

  26. Dynamic PRMs: Event Models Events: Discrete points where the system undergoes a discontinuous change • Events can be triggered by external events • an agent’s action or by system dynamics • e.g., a unit reaches its destination • Events can influence the system structure • discrete change in continuous dynamics • truck velocity goes to 0 when destination is reached • modification of relational structure • aircraft taking off is no longer on aircraft carrier • creation / deletion of objects • units entering/leaving battlespace

  27. Dynamic PRMs: Adding Actions • Use relational / hierarchical action representation • class hierarchy for Move action • an instantiation of a particular action is related to object moving, road taken, origin, destination • Actions can depend on and influence attributes of related objects • duration of Move action may depend on road condition, influence status of moving objects • Actions are like events, can change domain structure • Complex actions can be composed of simpler ones: • Effects of complex action derived from that of subactions

  28. Inference in Dynamic Systems • Main tasks: • situation monitoring • prediction • Goal: Exploit structure as we did in PRMs • First step: Encapsulation • Exploit structure of weakly interacting subsystems • Applied successfully to Dynamic Bayesian Nets

  29. Tracking in Dynamic PRMs • Use relational structure to guide belief state approximation • direct dependencies only between related objects • Deal with dynamic structure: • relations and even domain objects change over time • want to adjust our approximation to context • structural uncertainty critical • Event-driven tracking • no reason to use fine-grained model of “boring bits” • but “fast forward” requires ability to propagate dynamics over variable-length segments

  30. Outline • Probabilistic Relational Models • Temporal models • Decision making • Planning in factored MDPs • Planning in relational MDPs

  31. What is a Markov Decision Process? • An MDP is a controlled dynamic process • Stochastic transition between states • Actions affect system dynamics • Rewards or costs are associated with states • Objective: Drive process to regions of high reward • MDP solutions are policies • Policies assign an action to every state

  32. MDP Policies & Value Functions Suppose an expert told you the “value” of each state: V(s1) = 10 V(s2) = 5 s1 s1 0.7 0.5 s2 s2 0.3 0.5 Action 2 Action 1

  33. Greedy Policy Construction Pick action with highest expected future value: Expectation over next-state values

  34. Bootstrapping: Policy Iteration Idea: Greedy selection is useful even with suboptimal V Guess V Repeat until policy doesn’t change  = greedy(V) V = value of acting on  Guaranteed to find globally optimal policy if V is defined over explicit states, i.e., if V is exponential Exploit Structure with Factored Policy Iteration

  35. R2 R1 Factored MDPs: DBNS + Rewards t t+1 Rewards have small sets of parent variables too X Y Total reward adds sub-rewards: R=R1+R2 Z

  36. Linearly Decomposable Value Functions Note: Overlapping is allowed! Approximate high-dimensional value function with combination of lower-dimensional functions Motivation: Multi-attribute utility theory (Keeney & Raifa)

  37. Decomposable Value Functions • Each basis functionhi is the status of some small part(s) of a complex system • status of a machine • inventory of a store • status of a subgoal Linear combination of restricted domain functions

  38. Exploiting Structure X Key operation: backprojection of a basis function thru a DBN transition Y Z Structure allows us to consider operations over small subsets of variables, not the entire state space.

  39. +11 +1 +7 +4 +12 +8 Policy Format Factored value functions  compact action effect descriptions Action 1 Action 2 Sorted result values form a decision list: If then action 1 else if then action 2 else if then action 1

  40. Factored Policy Iteration: Summary Structure induces decision-list policy Guess V  = greedy(V) V = value of acting on  Key operations isomorphic to BN inference • Time per iteration reduced from O((2n)3) to O(Cbk3) • Cb = cost of Bayes net inference (function of structure) • k = number of basis functions (k << 2n)

  41. Run Times 70000 States Seconds 3n^3 60000 50000 40000 CPU Seconds/States 30000 20000 10000 0 4 6 8 10 12 14 16 State Variables Note: Nearly optimal policy found in all cases ( 6).

  42. Planning in Relational MDPs • Replace DBN transition model with dynamic PRM • Generalize factored policy iteration • Define basis functions via relational formulas: • Replace BN inference with PRM inference as key step • Exploit hierarchical structure of complex actions by encapsulating decision making along hierarchy • Potential benefits: • Tractable approximate planning in relational domains • Unification of classical and stochastic planning

  43. Conclusions: Past & Present • PRMs compactly represent complex systems with multiple interacting objects: • coherent (probabilistic) semantics; • structured representation: modularity & reuse. • Scalable inference that exploits structure • Tracking algorithms for DBNs that exploit system decomposition • Planning algorithms in MDPs that exploit structure of system and of value functions Theme: Representation & inference scale up, if we exploit structure

  44. Conclusions: Future • Better inference for densely connected PRMs • Extending PRMs with time, events, actions • Exploit structure for inference in dynamic PRMs: • system decomposition into subsystems • relational context • varying time granularity • Planning in dynamic PRMs: • extend factored policy iteration to PRMs • exploit hierarchical action decomposition

  45. Students & postdocs Nir Friedman ( Hebrew U.) Dirk Ormoneit Ron Parr ( Duke) Xavier Boyen Urszula Chajewska Lise Getoor Carlos Guestrin Uri Lerner Uri Nodelman Avi Pfeffer ( Harvard) Eran Segal Benjamin Taskar Simon Tong Brian Milch ( Berkeley) Ken Takusagawa ( MIT) Support: PECASE Award via ONR YIP DARPA’s HPKB Program MURI Program “Integrated Approach to Intelligent Systems” Sloan Faculty Fellowship DARPA’s IA Program under subcontract to SRI International DARPA’s DMIF Program under subcontract to IET Inc. ONR grant Acknowledgements Postdocs PhD students Ugrad http://robotics.stanford.edu/~koller/

More Related