1 / 40

Learning from how dogs learn

Learning from how dogs learn. Prof. Bruce Blumberg The Media Lab, MIT bruce@media.mit.edu www.media.mit.edu/~bruce. About me…. About me…. Practical & compelling real-time learning. Easy for interactive characters to learn what they ought to be able to learn

Philip
Télécharger la présentation

Learning from how dogs learn

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning from how dogs learn Prof. Bruce Blumberg The Media Lab, MIT bruce@media.mit.edu www.media.mit.edu/~bruce

  2. About me…

  3. About me…

  4. Practical & compelling real-time learning • Easy for interactive characters to learn what they ought to be able to learn • Easy for a human trainer to guide learning process • A compelling user experience • Provide heuristics and practical design principles

  5. My bias & focus • Learning occurs within an innate structure that biases… • Attention • Motivation • Innate frequency, form and organization of behavior • When certain things are most easily learned • What are the catalytic components of the scaffolding that make learning possible?

  6. sheep|dog:trial by eire See sheep|dog video on my website

  7. Object persistence See object persistence video on my website

  8. Temporal representation See temporal representation (aka Goatzilla) video on my website

  9. Alpha Wolf See alpha wolf video on my website

  10. Rover@home See rover@home video on my website or go to Scientific American Frontiers website

  11. Dobie T. Coyote Goes to School See Dobie video on my website

  12. Why look at Dog Training? • Interactive characters pose unique challenges: • State, action and state-action spaces are often continuous and far too big to search exhaustively • To be compelling characters must • Learn “obvious” contingencies between state, actions and consequences quickly • Easy to train without visibility into internal state of character. • Learning is only one thing they have to do. • Dogs and their trainers seem to solve these problems easily

  13. Invaluable resources • Doing it, and talking to people who do it. • Wilkes, Pryor, Ramirez • Lindsay, Burch & Bailey, Mackintosh • Lorenz, Leyhausen, Coppinger & Coppinger

  14. The problem facing dogs (real and synthetic) Set of all motivational goals Set of all possible stimuli Set of all possible actions What do I do, when, in order to best satisfy my motivational goals?

  15. Modality of Stimuli Smells Sounds Dog sounds Motion Set of all possible stimuli Speech Whistles The space of possible stimuli is wicked big State Space Time of Occurence

  16. Left ear twitch Shake High -5 Low shake Down Beg Figure -8 The space of possible actions is also very big Action Set of all possible actions Action Space Time of Performance

  17. Sounds Low shake Dog sounds Motion Speech Whistles Who gets credit for good things happening? Yumm.. Modality of Stimuli Action Left ear twitch Shake High -5 Down Beg Figure -8

  18. orient chase eye grab-bite kill-bite stalk Who gets credit for good things happening? Yumm.. Time

  19. Conventional idea: back propagation from goal Yumm.. grab-bite stalk orient kill-bite eye chase Time Credit flows backward

  20. Conventional idea: back propagation from goal Yumm.. grab-bite stalk orient kill-bite eye chase Time Credit flows backward

  21. Conventional idea: back propagation from goal Yumm.. grab-bite stalk orient kill-bite eye chase Time Credit flows backward

  22. The problem • If each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli) • If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore. • Don’t know if it is the right sequence until goal is reached • What happens if “variant” needs to be learned?

  23. Leyhausen’s suggestion… motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward grab-bite stalk orient kill-bite eye chase Time Each element is innately self-motivating and has innate reward metric

  24. Leyhausen’s suggestion… motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward grab-bite stalk orient kill-bite eye chase Time Each element is innately self-motivating and has innate reward metric

  25. Coppinger’s suggestion… grab-bite stalk orient kill-bite eye chase Time Varying innate tendency to follow behavior with “next” in sequence

  26. Functional goal plays incidental role Yumm.. grab-bite stalk orient kill-bite eye chase Time Propagated value from functional goal plays incidental role

  27. Big idea: innate biases make learning possible • Biases include… • Temporal Proximity implies causality • Attend more readily to certain classes of stimuli than to others (motion vs. speech) • Lazy discovery (pay attention once you have a reason to pay attention) • Elements may be “innately” self-motivating and have local metric of “goodness”

  28. Good trainers actively guide dog’s exploration • Behavioral • Train behavior, then cue • Differential rewards encourage variability • Motor • Shaping • Rewarding successive approximations • Luring • Pose, e.g. “down” • Trajectory, e.g. “figure-8”

  29. Dogs constrain search for causal agents Attention Window: Cue given immediately before or as dog is moving intodesired pose Consequences Window: Trainer “clicks” signaling reward is coming. When reward is actually received Sit Approach Eat Time Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows

  30. Dogs use implicit feedback to guide perceptual learning “sit-utterance” perceived. “click” perceived. Sit Approach Eat Time Dog decides to sit Build & update perceptual model of “sit-utterance” Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models

  31. Dogs give credit where credit is due… • Trainer repeatedly lures dog through a trajectory or into a pose • Eventually, dog performs behavior spontaneously • Implication • Dog associates reward with resulting body configuration or trajectory and not just with “follow-your nose”

  32. Observation: dogs give credit where credit is due “sit-utterance” perceived. “click” perceived. Sit Approach Eat Time Dog decides to sit Credit sitting in presence of “sit-utterance” Build & update perceptual model of “sit-utterance”

  33. D.L.: Take Advantage of Predictable Regularities • Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces • Use consequences to bias choice of action • But vary performance and attend to differences • Explore state and action spaces on “as-needed” basis • Build models on demand

  34. D.L.: Make Use of All Feedback: Explicit & Implicit • Use rewarded action as context for identifying • Promising state space and action space to explore • Good examples from which to construct perceptual models, e.g., • A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

  35. D.L.: Make Them Easy to Train • Respond quickly to “obvious” contingencies • Support Luring and Shaping • Techniques to prompt infrequently expressed or novel motor actions • “Trainer friendly” credit assignment • Assign credit to candidate that matches trainer’s expectation

  36. The System

  37. Dobie T. Coyote… See dobie video on my website

  38. Limitations and Future Work • Important extensions • Other kinds of learning (e.g., social or spatial) • Generalization • Sequences • Expectation-based emotion system • How will the system scale?

  39. Useful Insights • Use • Temporal proximity to limit search. • Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration • “trainer friendly” credit assignment • Luring and shaping are essential

  40. Acknowledgements • Members of the Synthetic Characters Group, past, present & future • Gary Wilkes • Funded by the Digital Life Consortium

More Related