500 likes | 670 Vues
Biologically-inspired robot spatial cognition based on rat neurophysiological studies Alejandra Barrera and Alfredo Weitzenfeld Auton Robot 2008. Rakesh Gosangi PRISM lab Department of Computer Science and Engineering Texas A&M University. Outline. Introduction Related work
E N D
Biologically-inspired robot spatial cognition based on rat neurophysiological studiesAlejandra Barrera and Alfredo WeitzenfeldAuton Robot 2008. Rakesh Gosangi PRISM lab Department of Computer Science and Engineering Texas A&M University
Outline • Introduction • Related work • Biologically inspired spatial cognition • Experimental results • Conclusion and Discussion
Introduction • SLAM – the problem of a mobile robot acquiring a map of its environment while localizing itself in the map. • Challenges in SLAM • Data association – if two features observed at different times correspond to the same object • Perceptual ambiguity – distinguish between places that provide similar or equivalent visual patterns
Spatial cognition in rats • Data association or place recognition in rats is based on cognitive maps generated in hippocampus • Cognitive maps are created from visual and kinesthetic feedback information • Rats can learn and unlearn to reward locations in goal-oriented tasks
Contribution of the paper • Neural network based spatial cognition model for a mobile robot inspired from rat’s brain structure • Build a holistic topological map of the environment • Recognize places previously visited • Learn-unlearn to reward locations • Perform goal-directed navigation • Use kinesthetic and visual cues from the environment
Outline • Introduction • Related work • Biologically inspired spatial cognition • Experimental results • Conclusion and Discussion
Comparison with Milford (2006) - RatSlam • The two models coincide with mapping and map adaptation but differ in goal-directed navigation • Milford et al. use a topological map of experiences where each experience codifies location and orientation • Transitions are associated with locomotion • In this paper, the nodes correspond to visual information patterns and path integration signals • Transitions correspond to orientation and locomotion of the rat
Experimental basis • Morris’ experiment (1981) • Two types of rats • Normal rats • Rats with hippocampal lesions • Two experimental situations • Visible platform • Submerged platform with visual cues around the arena • Normal rats relate their position with respect to visual cues and recognize target location
Image borrowed from - Morris, R. G. M. (1981). Spatial localization does not require the presence of local cues. Learning and Motivation, 12, 239–260.
Experimental basis • O’Keefe’s experiment (1983) • A reversal task on a T-maze • Rats with Hippocampal lesions • Learned to turn to right arm in a T-maze • Gradually changed their orientation for left arm to right arm in 8-arm maze • Their behavior was based on goal-location relative to body • Normal rats • Learned to turn to right arm in T-maze • The shifting from left to right was not gradual in an 8-arm maze • Their behavior was based on a spatial map constructed in hippocampus
Outline • Introduction • Related work • Biologically inspired spatial cognition • Experimental results • Conclusion and Discussion
Biologically inspired spatial cognition • Biological background • Affordances processing • Rat’s motivation • Path integration • Landmark processing • Place representation and recognition • Learning • Action Selection
Affordance processing • Affordances are coded as a linear array of cells called affordance perceptual schema • An affordance corresponds to a 45° turn relative to the rat’s head • Each affordance is represented as a Gaussian distribution, the activation of neuron i is give by
Motivation • The rat’s motivation is related to its hunger drive • The rat obtains a reward r(t) by the presence of food
Path Integration • Process of updating the position of the point of departure each time the animal performs a motion • Path integration helps an animal return home • Path integration uses kinesthetic information • Magnitude of rotation • Magnitude of translation • Path integration module is composed of two neural network layers • Dynamic Remapping Layer (DRL) • Path Integration Feature Detector Layer (PIFDL)
Dynamic Remapping Layer • 2-D array of neurons • The activation of a neuron (i, j) is computed as • (x, y) codify the anchor relative to initial coordinates in the plane • The anchor position displaces each time the rat moves by the same magnitude but in the opposite direction
The anchor position is updated by applying convolution between DR layer and a mask M • The DR Layer is updated according to C by centering the Gaussian at (r, c) – maximum value of C
Path Integration Feature Detector Layer • PIFDL is also a 2-Dimensional array of neurons • Every neuron in DLR is randomly connected to 50% on neuron in the PIFDL • The weights between the two layers are learned through Hebbian learning
Landmark Processing • Distance and orientation of each landmark is represented as a linear array of cells (LPS) • Each LPS is connected to a 2-Dimensional array of neurons called Landmark Feature Detector Layer (LFDL) • The connecting weights are learned through Hebbian learning • All the LFDLs are combined into a single Landmark Layer (LL) • Visual information pattern is stored in an array called LP
Place representation and recognition • Place Cell Layer (PCL) is a 2-Dimensional layer of neurons • Every neuron in PIFDL is randomly connected to 50% of neurons in the PCL • Every neuron in the LL(Landmark Layer) is connected to 50% of neurons in the PCL • The synaptic efficacy between the two layers is learned through Hebbian learning • PC encodes kinesthetic and visual information sensed by the rat at a given location and a given orientation
World Graph Layer • The nodes in the map represent different places • Arcs between the nodes represent • The direction of the rat’s head • Number of steps taken by the rat to move from one node to the other • Every node can be connected to eight actor units, one for each direction • Place recognition • SD is the similarity degree, N is the number of cells
Learning • Learn and unlearn reward locations by reinforcement learning through Actor-Critic Architecture • Adaptive Critic (AC) unit contains a Predictive Unit (PU) which estimates future rewards for every place • Every neuron in PCL(Place cell layer) is connected to PU and every connection has • A weight w • Eligibility trace e • P(t) is expected reward at time t • r’(t) is effective reinforcement signal
Action Selection • Action selection is based on four signals • Available affordances at time t (AF) • Random rotations between available affordances (RPS) • Unexplored rotations from current location (CPS) • Global Expectation of Maximum Reward (EMR) • Representation • Each affordance in AF is represented as a Gaussian • RPS is one Gaussian centered at a random array position • CPS capture the animal’s curiosity. • As many Gaussians as unexecuted rotations at that location
Outline • Introduction • Related work • Biologically inspired spatial cognition • Experimental results • Conclusion and Discussion
Experiments • Hardware • Sony AIBO ERS-210 4 legged robot • 1.8 GHz P4 processor • A local camera with 50° horizontal view and 40° vertical view • At a given time step the robot takes three non-overlapping snapshots (0°, +90°, -90°) • Visual processing analyzes the number of colored pixels in the images • Kinesthetic information is obtained from the external motor control, there is no odometer • Four experimental conditions
Experiment 1 – T-maze • Departure point is the base of the maze • During training phase the goal is set at the end of the left arm • During the testing phase the goal is shifted to the right arm • Results • The robot takes 16 trials to completely unlearn the previously correct hypothesis • When the expectation of reward exceeds noise the robot starts visiting the right arm • In O’Keefe’s experiments (1983), the rats chose the right arm 90% of the time by 24th trial
Experiment 2 – 8-arm radial maze • The goal is set at -90° arm during training phase • During the testing phase the goal is set at +90° arm • Results • When the expectation of reward for -90° arm is smaller than noise the robot visits other arms randomly • By the 12th trial the robot starts choosing the +90° arm • In O’Keefe’s experiments (1983) the rats chose the correct arm by 20th trial
Experiment 3 – Multiple T-maze • The robot departs at the base of vertical T-maze • During training phase the goal is placed at right arm (90°) of the left horizontal T-maze • During testing phase the goal is placed at right arm (270°) of the right horizontal T-maze • Results • If the robot reaches the goal at the end of a path then it is positively reinforced • If a path does not lead the robot to a goal it is negatively reinforced thus unlearning the path • The robot completely unlearns previous goal by 20thtrial
Experiment 4 – Maze with landmarks • Three colored cylinders were placed outside the maze as landmarks • During testing the robot was placed at different starting locations • Results • The robots use place recognition to find goals • All the robots found the goal successfully from all starting positions
Outline • Introduction • Related work • Biologically inspired spatial cognition • Experimental results • Conclusion and Discussion
Discussion and conclusions • The model proposed capture some behavioral aspects of rats • Abilities • Build a holistic topological map in real time • Learn and unlearn goal locations • Exploit the cognitive map to recognize visited places • Very simplistic perceptual system • The current model cannot deal with real environments • Affordance space and landmark space is discrete • Computationally expensive to process continuous spaces