1 / 51

Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways

Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways. Zhengping Ji Embodied Intelligence Laboratory Computer Science and Engineering Michigan State University, Lansing, USA. Outline. Attention and recognition: Chicken-egg problem

Télécharger la présentation

Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence Laboratory Computer Science and Engineering Michigan State University, Lansing, USA

  2. Outline • Attention and recognition: Chicken-egg problem • Motivation: brain inspired, neuromorphic, brain’s visual pathway • Saliency-based attention • Where-what Network (WWN): • How to integrate the saliency-based attention & top-down attention control • How attention and recognition helps each other • Conclusions and future work

  3. What is attention?

  4. Bottom-up Attention (Saliency)

  5. Bottom-up Attention (Saliency)

  6. Attention Shifting

  7. Attention Shifting

  8. Attention Shifting

  9. Attention Shifting

  10. Spatial Top-down Attention Control

  11. Spatial Top-down Attention Control e.g. pay attention to the center

  12. Object-based Top-down Attention Control

  13. Object-based Top-down Attention Control e.g. pay attention to the square

  14. Chicken-egg Problem • Without attention, recognition cannot do well: • recognition requires attended areas for the further processing. • Without recognition, attention is limited: • not only bottom-up saliency-based cues, but also top-downobject-dependant signals and top-down spatial controls.

  15. Problem

  16. Challenge • High-dimensional space • Background noise • Large variance • Scale • Shape • Illumination • View point • …..

  17. Naïve way: attention window by guessing Win3 Win4 Win2 Win5 Win1 Win6 Saliency-based Attention (I) Boundary Detection Part The mapping from two visual images to correct road boundary type for each sub-window (Reinforcement Learning) IHDR Tree e1 Desired Path e5 e2 e3 e6 e4 Action Generation Part The mapping from road boundary type to correct heading direction (Supervised Learning) IHDR Tree Heading Direction

  18. Low-level image processing Saliency-based Attention (II) Itti & Koch et al. 1998

  19. Review • Attention and recognition: Chicken-egg problem • Motivation: brain inspired, neuromorphic, brain’s visual pathway • Saliency-based attention • Where-what Network (WWN): • How to integrate the saliency-based attention & top-down attention control • How attention and recognition helps each other • Conclusions and future work

  20. Biological Motivations

  21. Challenge: Foreground Teaching • How does a neuron separate a foreground from a complex background? • No need for a teacher to hand-segment the foreground • Fixed foreground, changing background • E.g., during baby object tracking • The background weights are averaged out(no effect during neuronal competition)

  22. Novelty • Bottom-up attention: • Koch & Ullman in 1985, Itti & Koch et al. 1998, Baker et al. 2001, etc. • Position based top-down control: • Olshausen et al. 1993, Tsotsos et al. 1995, Mozer et al. 1996, Schill et al. 2001, Rao et al. 2004, etc. • Object based top-down control: • Deco & Rolls 2004 (no performance evaluation), etc. • Our work: • Saliency is developed features • Both bottom-up and top-down based control • Top-down: either object, position or none • Attention and recognition is a single process

  23. ICDL Architecture (r, c) 40*40 pixel-based Size fixed: 20*20 “where”-motor global 40*40 11*11 11*11 global 21*21 11*11 V1 V2 Image “what”-motor

  24. Multi-level Receptive Fields

  25. Layer Computation • Compute pre-response of cell (i, j) at time t • Sort: z1≥ z2≥ … zk… ≥ zm; • Only top-k neurons respond to keep selectiveness and long-term memory • Response range is normalized • Update the local winners

  26. In-place Learning Rule • Do not use back-prop • Not biologically plausible • Does not give long-term memory • Do not use any distribution model (e.g., Gaussian mixture) • Avoid high complexity of covariance matrix • New Hebbian like rule: • With automatic plasticity scheduling: only winners update • Minimum error toward target in every incremental estimation stage (local first principal component)

  27. Recruit & identify class invariant features Recruit & identify position invariant features Top-down Attention

  28. Experiment Foreground objects defined by “what” motor (20*20) Attended areas defined by “where” motor Randomly Selected background patches (40*40)

  29. Developed Layer 1 Bottom-up synaptic weights of neurons in Layer 1, developed through randomly selected patches from natural images.

  30. Developed Layer 2 Not Intuitive for understanding!! Bottom-up synaptic weights of neurons in Layer 2.

  31. Response Weighted Stimuli for Layer 2

  32. Experimental Result I Recognition rate with incremental learning

  33. Experimental Result II (a) Examples of input images; (b) Responses of attention (“where”) motors when supervised by “what” motors. (c) Responses of attention (“where”) motor when “what” supervision is not available.

  34. Summary • “What” motor helps to direct attention of network to features of particular object; • “Where” motor helps to direct attention to positional information (from 45% to 100% accurate when “where” information is present); • Saliency-based bottom-up attention, location-based top-down attention, and object-based top-down attention are integrated in the top-k spatial competition rule;

  35. Problems • The accuracy for the “where” motors is not good: 45.53% • Layer 1 was developed offline; • More layers are needed to handle more positions • Where motor should be given externally, instead of retina-based representation • No internal iterations especially when the number of hidden layers is larger than one • No cross-level projections

  36. Fully Implemented WWN (Original Design) “where”-motor PP MT (r, c) 25 center Fixed size motor V3 LIP 11*11 11*11 31*31 11*11 21*21 global global Image (40*40) V1 (40*40) V2 (40*40) V4 (40*40) IT (40*40) “what”-motor: 4 objects

  37. Problems • The accuracy for “where” and “what” motors are not good: 25.53% for “what” motor and 4.15% for “where” motor • Too many parameters to be tuned • Training is extremely slow • How to do the internal iterations • “Sweeping” way: always use recently updated weights and responses. • Always use p-1 weights and responses, where p records the current number of iterations. • The response should not be normalized in each lateral inhibition neighborhood.

  38. Modified Simple Architecture Retina-based supervision (r, c) 5 centers Size fixed: 20*20 “where”-motor global 40*40 11*11 11*11 global 21*21 11*11 V1 V2 Image “what”-motor : 5 Objects

  39. Advantage • Internal iterations are not necessary • Network is running much faster • Easier to track neural representations and evaluate performance • Performance evaluation • What motor reaches 100% accuracy for disjoint test • Where motor reaches 41.09% accuracy for disjoint test

  40. Problems Dominance by Top-down Projection Bottom-up responses + Top-down projection from motor Total responses Top-down responses

  41. Solution • Sparse bottom-up responses by only keeping local top-k winner of bottom-up responses • The performance of where motor increases from around 40% to 91%.

  42. Fully Implemented WWN (Latest) “where”-motor MT Each cortex: Modified ADAST (r, c) 3*3 center Fixed-size: 20*20 (40*40) (smoothing by Gaussian) 11*11 11*11 11*11 21*21 Image (40*40) V1 (40*35) V2 (40*40) V4 (40*40) “what”-motor: 5 objects (smoothing by Gaussian)

  43. Modified ADAST L2/3 L4 Previous Cortex L2/3 Next Cortex L5 (ranking) L6 (ranking)

  44. Other improvements • Smooth the external motors using Gaussian function • Where motors are evaluated by regression errors • Local top-k is adaptive by neuron positions • The network does not converge by internal iterations • learning rate for top-down excitation is adaptive by internal iterations. • Using context information

  45. Layer 1 – Bottom-up Weights

  46. Layer 2 – Response-weighted Stimuli

  47. Layer 3 (Where) – Top-down Weights

  48. Layer 3 (What) – Top-down Weights

  49. Test Samples Input “Where” motor (ground truth) “What” motor (ground truth) “What” output (“Where” supervised) “Where” output (Saliency-based) “Where” output (“What” supervised) “What” output (Saliency-based)

  50. Performance Evaluation Average error for “where” and “what” motors (250 test samples)

More Related