1 / 67

Reinforcement Learning of Context Models for Ubiquitous Computing

Reinforcement Learning of Context Models for Ubiquitous Computing. Sofia Zaidenberg Laboratoire d’Informatique de Grenoble Prima Group . Under the supervision of Patrick Reignier and James L. Crowley. Ambient Computing. [Weiser, 1991] [Weiser, 1994] [Weiser and Brown, 1996].

elton
Télécharger la présentation

Reinforcement Learning of Context Models for Ubiquitous Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement Learningof Context Modelsfor Ubiquitous Computing Sofia Zaidenberg Laboratoire d’Informatique de Grenoble • Prima Group Under the supervision of • Patrick Reignier and James L. Crowley PULSAR Research Meeting

  2. Ambient Computing [Weiser, 1991] [Weiser, 1994] [Weiser and Brown, 1996] Ubiquitous Computing (ubicomp) PULSAR Research Meeting

  3. PULSAR Research Meeting

  4. PULSAR Research Meeting

  5. Ambient Computing • « autistic » devices • Independent • Heterogeneous • Unaware • Ubiquitous systems • Accompany without imposing • In periphery of the attention • Invisible • Calm computing PULSAR Research Meeting

  6. 15 years later… • Ubiquitous computing is already here [Bell and Dourish, 2007] • It’s not exactly like we expected • “Singapore, the intelligent island” • “U-Korea” • Not seamless but messy • Not invisible but flashy • Characterized by improvisation and appropriation • Engaging user experiences [Rogers, 2006] PULSAR Research Meeting

  7. New directions • Study habits and ways of living of people and create technologies based on that (and not the opposite)[Barton and Pierce, 2006; Pascoe et al., 2007; Taylor et al., 2007; Jose, 2008] • Redefine smart technologies [Rogers, 2006] • Our goal: • Provide an AmI application assisting the userin his everyday activities PULSAR Research Meeting

  8. Our goal • Context-aware computing+ Personalization • Situation + user  action Perception Decision Alice Bob PULSAR Research Meeting

  9. Proposed solution Personalization by Learning PULSAR Research Meeting

  10. Outline • Problem statement • Learning in ubiquitous systems • User Study • Ubiquitous system • Reinforcement learning of a context model • Experimentations and results • Conclusion PULSAR Research Meeting

  11. Proposed system • A virtual assistant embodying the ubiquitous system • The assistant • Perceives the context using its sensors • Executes actions using its actuators • Receives user feedback for training • Adapts its behavior to this feedback (learning) PULSAR Research Meeting

  12. Constraints • Simple training • Fast learning • Initial behavior consistency • Life long learning • User trust • Transparence [Bellotti and Edwards, 2001] • Intelligibility • System behavior understood by humans • Accountability • System able to explain itself The system is adapting to environment and preferences changes PULSAR Research Meeting

  13. Example Reminder! hyperion J109 J120 PULSAR Research Meeting

  14. Outline • Problem statement • Learning in ubiquitous systems • User Study • Ubiquitous system • Reinforcement learning of a context model • Experimentations and results • Conclusion PULSAR Research Meeting

  15. User study • Why a general public user study? • Original Weiser’s vision of calm computing revealed itself to be unsuited to current user needs • Objective • Evaluate the expectations and needs vis-à-vis “ambient computing” and its usages PULSAR Research Meeting

  16. Terms of the study • 26 interviewed subjects • Non-experts • Distributed as follows: PULSAR Research Meeting

  17. Terms of the study • 1 hour interviews with open discussion andsupport of an interactive model • Questions about advantages and drawbacks of a ubiquitous assistant • User ideas about interesting and useful usages PULSAR Research Meeting

  18. Results • 44 % of subjects interested, 13 % conquered • Profiles of interested subjects: • Very busy people • Experiencing cognitive overload • Leaning considered as a plus • More reliable system • Gradual training vs. heavy configuration • Simple and pleasant training (“one click”) PULSAR Research Meeting

  19. Results • Short learning phase • Explanations are essential • Interactions • Depending on the subject • Optional debriefing phase • Mistakes accepted as long as the consequences are not critical • Use control • Reveals subconscious customs • Worry of dependence PULSAR Research Meeting

  20. Conclusions • Constraints • Not a black box • [Bellotti and Edwards, 2001] Intelligibility and accountability • Simple, not intrusive training • Short training period, fast re-adaptation to preference changes • Coherent initial behavior • Build a ubiquitous assistant based on these constraints PULSAR Research Meeting

  21. Outline • Problem statement • Learning in ubiquitous systems • User Study • Ubiquitous system • Reinforcement learning of a context model • Experimentations and results • Conclusion PULSAR Research Meeting

  22. Ubiquitous system • Uses the existing devices Multiplatform system Distributed system OMiSCID [Emonet et al., 2006] Communication protocol Dynamic service discovery Easily deployable OSGi Heterogeneous Reminder! hyperion J109 Scattered J120 PULSAR Research Meeting

  23. Modules interconnection Sensors Actuators applications applications Keyboardactivity Emails Emails localiZation Text tospeech prEsence Remotecontrol PULSAR Research Meeting

  24. Example of message exchange Text2Speech on protee? OMiSCID Yes! No… Bundlerepository Install and start Text2Speech hyperion protee J109 J120 PULSAR Research Meeting

  25. Database • Contains • Static knowledge • History of events and actions • To provide explanations • Centralized • Queried • Fed • Simplifies queries by all modules on all devices PULSAR Research Meeting

  26. Outline • Problem statement • Learning in ubiquitous systems • User Study • Ubiquitous system • Reinforcement learning of a context model • Reinforcement learning • Applying reinforcement learning • Experimentations and results • Conclusion PULSAR Research Meeting

  27. Reminder: our constraints • Simple training • Fast learning • Initial behavior consistency • Life long learning • Explanations Supervised [Brdiczkaet al., 2007] PULSAR Research Meeting

  28. Reinforcement learning (RL) • Markov property • The state at time t depends only on the state at time t-1 PULSAR Research Meeting

  29. Standard algorithm • q-Learning [Watkins, 1989] • Updates q-values on a new experience{state, action, next state, reward} • Slow because evolution only when something happens • Needs a lot of examples to learn a behavior PULSAR Research Meeting

  30. Dyna architecture [Sutton, 1991] dyna Switch World model World State Reward Action Agent PULSAR Research Meeting

  31. World model Environment Dyna architecture Realinteractions Update Update Use Update Policy Policy PULSAR Research Meeting

  32. World model Global system Perception Policy State Update Realinteractions Action Action Reward? Example Example Use Update Reward Policy Database Environment PULSAR Research Meeting

  33. World model Modeling of the problem • Components: • States • Actions • Components: • Transition model • Reward model Realinteractions Update Use Update Policy PULSAR Research Meeting

  34. World model State space • States defined by predicates • Understandable by humans (explanations) • Examples : • newEmail ( from = Marc, to = Bob ) • isInOffice ( John ) • State-action: • entrance( ) • Pause music Predicates Realinteractions Update System predicates Use Update Policy Karl <+> Environment predicates PULSAR Research Meeting

  35. State space • State split • newEmail( from= directeur, to= <+> ) • Notify • newEmail(from = newsletter, to= <+> ) • Do not notify PULSAR Research Meeting

  36. World model Action space • Possible actions combine • Forwarding a reminder to the user • Notify of a new email • Lock a computer screen • Unlock a computer screen • Pause the music playing on a computer • Un-pause the music playing on a computer • Do nothing Realinteractions Update Use Update Policy PULSAR Research Meeting

  37. World model Reward • Explicit reward • Through a non-intrusive user interface • Problems with user rewards • Implicit reward • Gathered from clues(numerical value of lower amplitude) • Smoothing of the model Realinteractions Update Use Update Policy PULSAR Research Meeting

  38. World model World model • Built using supervised learning • From real examples • Initialized using common sense • Functional system from the beginning • Initial model vs. initial Q-values [Kaelbling, 2004] • Extensibility Transitionmodel Rewardmodel Rewardmodel Realinteractions Update Use Update Politique PULSAR Research Meeting

  39. Transitionmodel Transition model Rewardmodel Action or event s1 s2 + Probability Starting states Modifications PULSAR Research Meeting

  40. World model Supervised learning ofthe transition model • Database of examples{previous state, action, next state} s s’ Update Realinteractions t1 t2 Use Update t3 Policy … tn+1 s’ PULSAR Research Meeting

  41. World model Global system Perception Policy State Update Realinteractions Action Action Reward? Example Example Use Update Reward Policy Database Environment PULSAR Research Meeting

  42. World model Episode • Episode steps have 2 stages: • Select an event that modifies the state • Select an action to react to that event Update Realinteractions Use Update Policy PULSAR Research Meeting

  43. World model Environment Episode Experience Q-Learning updates Policy Update Realinteractions World model RL agent Use Update Policy or Policy Learned from real interactions Database PULSAR Research Meeting

  44. Outline • Problem statement • Learning in ubiquitous systems • User Study • Ubiquitous system • Reinforcement learning of a context model • Experimentations and results • Conclusion PULSAR Research Meeting

  45. Experimentations • General public survey  qualitative evaluation • Quantitative evaluations in 2 steps: • Evaluation of the initial phase • Evaluation of the system during normal functioning PULSAR Research Meeting

  46. Evaluation 1« about initial learning » PULSAR Research Meeting

  47. Evaluation 1« about initial learning » Number ofiterationsper episode PULSAR Research Meeting

  48. Evaluation 2« interactions and learning » PULSAR Research Meeting

  49. Evaluation 2« interactions and learning » PULSAR Research Meeting

  50. Outline • Problem statement • Learning in ubiquitous systems • User Study • Ubiquitous system • Reinforcement learning of a context model • Experimentations and results • Conclusion PULSAR Research Meeting

More Related