1 / 15

Hierarchical POMDP Planning and Execution

Hierarchical POMDP Planning and Execution. Joelle Pineau Machine Learning Lunch November 20, 2000. S 1. S 2. S 3. Partially Observable MDP. POMDPs are characterized by: States: s S Actions: aA Observations: oO Transition probabilities: T(s,a,s’)=Pr(s’|s,a)

muriels
Télécharger la présentation

Hierarchical POMDP Planning and Execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

  2. S1 S2 S3 Partially Observable MDP • POMDPs are characterized by: • States: sS • Actions: aA • Observations: oO • Transition probabilities: T(s,a,s’)=Pr(s’|s,a) • Observation probabilities: T(o,a,s’)=Pr(o|s,a) • Rewards: R(s,a) • Beliefs: b(st)=Pr(st|ot,at,…,o0,a0)

  3. The problem • How can we find good policies for complex POMDPs? • Is there a principled way to provide near-optimal policies?

  4. Act InvestigateHealth Move Navigate AskWhere CheckPulse CheckMeds Left Right Up Down Proposed Approach • Exploit structure in the problem domain. • What type of structure? • Action set partitioning

  5. Hierarchical POMDP Planning • What do we start with? • A full POMDP model: {So,Ao,Oo,Mo}. • An action set partitioning graph. • Key idea: • Break the problem into many “related” POMDPs. • Each smaller POMDP has only a subset of Ao. • imposing policy constraint • But why? • POMDP: exponential run-time per value iteration O(|A|n-1|O|)

  6. Example POMDP: Value Function: 0.8 M KitchenState 0.1 MedsState BedroomState 0.1 CheckMeds GoToKitchen E 0.1 GoToBedroom 0.1 K B 0.1 ClarifyTask 0.8 0.8 0.1 So= {Meds, Kitchen, Bedroom} Ao = {ClarifyTask, CheckMeds, GoToKitchen, GoToBedroom} Oo = {Noise, Meds, Kitchen, Bedroom}

  7. Hierarchical POMDP Action Partitioning: Act CheckMeds ClarifyTask Move ClarifyTask GoToKitchen GoToBedroom

  8. KitchenState MedsState GoToKitchen BedroomState GoToBedroom ClarifyTask Local Value Function and Policy - Move Controller

  9. Modeling Abstract Actions Problem: Need parameters for abstract action Move Solution: Use the local policy of corresponding low-level controller General form: Pr ( sj | si, akabstract ) = Pr ( sj | si, Policy(akabstract,si) ) Example: Pr ( sj | MedsState, Move ) = Pr ( sj | MedsState, ClarifyTask ) Policy(Move,si): KitchenState MedsState BedroomState GoToKitchen GoToBedroom ClarifyTask

  10. KitchenState MedsState BedroomState CheckMeds Move Local Value Function and Policy - Act Controller

  11. = ClarifyTask = CheckMeds = GoToKitchen = GoToBedroom Comparing Policies Hierarchical Policy: Optimal Policy:

  12. Bounding the value of the approximation • Value function of top-level controller is an upper-bound on the value of the approximation. • Why? We were optimistic when modeling the abstract action. • Similarly, we can find a lower-bound. • How? We can assume “worst-case” view when modeling the abstract action. • If we partition the action set differently, we will get different bounds.

  13. A real dialogue management example - SayTime Act CheckHealth - AskHealth - OfferHelp CheckWeather Greet Move DoMeds Phone - GreetGeneral - GreetMorning - GreetNight - RespondThanks - AskGoWhere - GoToRoom - GoToKitchen - GoToFollow - VerifyRoom - VerifyKitchen - VerifyFollow - AskWeatherTime - SayCurrent - SayToday - SayTomorrow - StartMeds - NextMeds - ForceMeds - QuitMeds - AskCallWho - CallHelp - CallNurse - CallRelative - VerifyHelp - VerifyNurse - VerifyRelative

  14. Results:

  15. Final words • We presented: • a general framework to exploit structure in POMDPs; • Future work: • automatic generation of good action partitioning; • conditions for additional observation abstraction; • bigger problems!

More Related