440 likes | 629 Vues
Graphical Models in Vision . Alan L. Yuille. UCLA. Dept. Statistics. The Purpose of Vision. “To Know What is Where by Looking”. Aristotle. (384-322 BC). Information Processing: receive a signal by light rays and decode its information.
E N D
Graphical Models in Vision. Alan L. Yuille. UCLA. Dept. Statistics
The Purpose of Vision. • “To Know What is Where by Looking”. Aristotle. (384-322 BC). • Information Processing: receive a signal by light rays and decode its information. • Vision appears deceptively simple, but there is more to Vision than meets the Eye.
What are Humans Ideal for? • Clearly humans are not good at determining the size of objects in images – at least for these types of stimuli. • But they are good at determining context and taking contextual cues into account – i.e. use perspective cues to estimate depth and make adjustments. • What reasoning/statistical tasks are humans ideal for?
Visual Illusions • The perception of brightness of a surface, • or the length of a line, • depends on context. • Not on basic measurements like: • the no. of photons that reach the eye • or the length of line in the image.
Vision is ill-posed. • Vision is ill-posed – the data in the retina is not sufficient to unambiguously determine the visual scene. • Vision is possible because we have prior knowledge about visual scenes. • Even simple perception is an act of creation.
Perception as Inference • Helmholtz. 1821-1894. • “Perception as Unconscious Inference”.
How Hard is Vision? • The Human Brain devotes an enormous amount of resources to vision. • (I) Optic nerve is the biggest nerve in the body. • (II) Roughly half of the neurons in the cortex are involved in vision (van Essen). • If intelligence is proportional to neural activity, then vision requires more intelligence than mathematics or chess.
Vision and Artificial Intelligence • The hardness of vision became clearer when the Artificial Intelligence community tried to design computer programs to do vision. ’60s. • AI workers thought that vision was “low- level” and easy. • Prof. Marvin Minsky (pioneer of AI) asked a student to solve vision as a summer project.
Chess and Face Detection • Artificial Intelligence Community preferred Chess to Vision. • By the mid-90’s Chess programs could beat the world champion Kasparov. • But computers could not find faces in images.
Man and Machine. • David Marr (1945-1980) • Three Levels of explanation: 1. Computation Level/Information Processing 2. Algorithmic Level 3. Hardware: Neurons versus silicon chips. Claim: Man and Machine are similar at Level 1.
Vision as Probabilistic Inference • Represent the World by S. • Represent the Image by I. • Goal: decode I and infer S. • Model image formation by likelihood function, generative model, P(I|S) • Model our knowledge of the world by a prior P(S).
Bayes Theorem • Then Bayes’ Theorem states we show infer the world S from I by • P(S|I) = P(I|S)P(S)/P(I). • Rev. T. Bayes. 1702-1761
Bayes to Infer S from I • P(I|S) likelihood function . P(S) prior. .
Ambiguity and Complexity of Images. • Similar objects give rise to very different images. Different objects can cause similar images.
Ideal Observers The Image of a cylinder is consistent with multiple objects and viewpoints. • The likelihood is ambiguous (concave or convex). • The prior resolves the ambiguity by biasing towards convex objects viewed from above.
Influence Graphs and Visual Tasks • Influence Graphs and the Visual Task
A Simple Taxonomy of Graphs • A Taxonomy of Graphs: B. C. D.
Examples of Vision Tasks • Visual Inference: (1) Estimating Shape. (2) Segmenting Images. (3) Detecting Faces. (4) Detecting and Reading Text. (5) Parsing the full image – detect and recognize all objects in the image, understand the viewed scene.
Analysis by Synthesis • Invert generation process to parse the image. • Probabilistic Grammars for image generation (week 2).
Probabilistic Grammars for Images • (I) Image are generated by composing visual patterns: • (II) Parse an image by decomposing it into patterns.
Generative Models for Patterns • Examples of images synthesized from generative models (MCMC).
Towards Full Image Parsing • The image genome project (Zhu). • Attempt to determine the grammar for images by interactive parsing of images. • Thereby learn the statistical regularities of images – the priors and the representations.
Back to the Brain • Top-Level; compare human performance to Ideal Observers. • Explain human perceptual biases (visual illusions) as strategies that are “statistical effective”.
Brain Architecture • The Bayesian models have interesting analogies to the brain. • Generative models and analysis by synthesis. • This is consistent with top-down processing? (Kersten’s talk next week).
Conclusion • Vision is unconscious inference. • Bayesian Approach lead to vision as analysis by synthesis -- inverting the image generation process. • This requires “sophisticated” priors about the statistics of natural images. • This can be formulated mathematically in terms of Probabilistic Grammars for image formation. • These grammars can be learnt by analysing the “sophisticated” statistics of natural images.