1 / 21

Learning Distinctions and Rules in a Continuous World through Active Exploration

Learning Distinctions and Rules in a Continuous World through Active Exploration. Paper by Jonathan Mugan & Benjamin Kuipers Presented by daniel Hough. The Challenge. To build a robot which learns its environment like children do.

upton
Télécharger la présentation

Learning Distinctions and Rules in a Continuous World through Active Exploration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Distinctions and Rules in a Continuous World through Active Exploration Paper by Jonathan Mugan & Benjamin Kuipers Presented by daniel Hough

  2. The Challenge To build a robot which learns its environment like children do. Piaget [1952] theorised that children constructed this knowledge in stages Cohen [2002] proposed that children have a domain-general information processing system for bootstrapping knowledge.

  3. Foundations • The Focus of the work: How a developing agent can learn temporal contingencies in the form of predictive rules over events. • Watson [2001] proposed a model of contingencies based on his observations of infant behaviour: • Prospective temporal contingency: Event B tends to follow Event A with a likelihood greater than chance • Retrospective temporal contingency: Event A tends to come before Event B more often than chance. • Distinctions must be found to determine when an event has occurred.

  4. Foundations Drescher [1991] proposed a model inspired by Piaget where contingencies (here schemas) are found using marginal attribution. Results are found which follow actions in a method similar to Watson’s. For each schema (in the form of an action + result), the algorithm searches for context (situation) that makes the result more likely to follow that action.

  5. The MethodIntroduction Here, prospective contingencies as well as contingencies in which events occur simultaneously are represented using predictive rules These rules are learned using a method inspired by marginal attribution The difference with Drescher is continuous variables. This brings up the issue of determining when events occur, so distinctions must be found.

  6. The MethodIntroduction Motor babbling method from last week to learn distinctions and contingencies. This was undirected, does not allow learning for larger problems – too much effort is wasted on uninteresting portions of state space.

  7. The MethodIntroduction • In this algorithm, the agent receives as input the values of time-varying continuous variables but can only represent, reason about and construct knowledge using discrete values. • Continuous values are discretised using distinctions in the form of landmarks: • A discrete value v(t) for each continuous variable v’(t); • If for landmarks v1 and v2, v1 < v’(t) < v2 then v(t) has the open interval between v1 and v2 as its value, v = (v1,v2). • The association means agent can focus on changes of v = events • The agent greedily learns rules that use one event to predict another.

  8. The MethodHow it’s evaluated The method is evaluated using a simulated robot based on the situation of a baby sitting in a high chair. Fig. 1: Adorable Fig. 2: Less adorable

  9. The MethodKnowledge Representation & Learning The goal is for the agent to learn to identify landmark values from its own experience. The importance of a qualitative distinction is estimated from the reliability of the rules that can be learned, given that distinction. The qualitative representation is based on QSIM [Kupiers, 1994]

  10. The MethodKnowledge Representation & Learning A continuous variable x’(t) is represented by discrete variable x(t) for magnitude and x’’(t) for the direction of change of x’(t), and ranges over some subset of the real number line (-∞, +∞). In QSIM, magnitude is abstracted to a discrete variable x(t) that ranges over a quantity space Q(x) of qualitative values.Q(x) = L(x) U I(x)where L(x) = {x1,...,xn} landmark values I(x) = {(-∞,x1),(x1,x2),...,(xn, +∞)} mutually disjoint open intervals

  11. The MethodKnowledge Representation & Learning A quantity space with two landmarks might be described as (x1,x2), which implies five distinct qualtitative values,Q(x) = {(-∞,x1),x1,(x1,x2),x2,(x2, +∞)} A discrete variable x’’(t) for direction of change of x’(t) has a single intrinsic landmark at 0, so its initial quantity space isQ(x’’) = {(-∞,0),(0,+∞)}

  12. The MethodKnowledge Representation & Learning: Events If a is the qualitative value of a discrete variable A, meaning a ∈ Q(A), then the event At→ a is defined by A(t – 1) =/= a and A(t) = a That is, an event takes place when a discrete variable A changes to value a at time t, from some other value.

  13. The MethodKnowledge Representation & Learning: Predictive Rules • This is how temporal contingencies are described • There are two types of predictive rules: • Causal: one event occurs after another later in time • Functional: linked by a function so happen at the same time

  14. The MethodLearning a predictive rule • The agent wants to learn rule which predicts a certain event h • It will look at other events and find that if one, u, leads to h more likely than others, then it will create a rule with that event as the antecedent • It does so by starting with an initial rule with no context

  15. The MethodLandmarks When a new landmark is inserted into Q(x) we replace one interval with two intervals and the dividing landmark, e.g. a new landmark x* we have(xi,x*),x*,(x*,xi+1) Whenever a new landmark is inserted, statistics about the previous state space are thrown out and new ones are built up. This means checking that the reliability of the rule must be checked.

  16. The MethodThe Learning Process • Do 7 times • Actively explore world with set of candidate goals coming from discrete variables in M for 1000 timesteps • Learn new causal and functional rules • Learn new landmarks by examining statistics stored in rules and events • Gather 3000 more timesteps of experience to solidify the learned rules • Update the strata • Goto 1

  17. EvaluationExperimental Setup The robot has two motor variables, one for each of its degrees of freedom A perpetual system creates variables for each of the two tracked objects in the environment: the hand and the block. Too many variables to reasonably explain here, each has various constraints During learning of the block is knocked off the tray or if it is not moved for 300 timesteps, it’s put back on the tray in a random position within reach of the agent

  18. EvaluationExperimental Results The algorithm was evaluated using the simple task of moving the block in a specified direction. It was ran five times using passive learning and five using active learning and each run lasted 120,000 timesteps. Each active run of the algorithm resulted in an average of 62 predictive rules. The agent gains proficiency as it learns until reaching threshold at approximately 70,000 timesteps for both.

  19. EvaluationExperimental Results Clearly, active exploration appears to do better since at 40,000 timesteps, active learning achieves the level passive has at 60,000 timesteps.

  20. The Complexity of Space and Time The storage required to learn new rules is O(e2), as is the number of rules – but only a small number are learned by the agent. Using marginal attribution each rule requires storage O(e), although all pairs of events are stored for simplicity.

  21. Conclusion First the agent could only determine the direction of movement of an object Active exploration of environment and using rules to learn distinctions then using distinctions to learn more rules, the agent progressed from having a very simple representation towards a representation that is aligned with the natural “joints” of its environment.

More Related