210 likes | 327 Vues
This paper explores how humans interpret visual changes rather than relying on raw intensity values. It discusses the complexities of pixel-based images and highlights that our brains often infer rather than directly sense changes in intensity, color, and other attributes. By analyzing retinal mechanisms and examples from vision science, the author argues for improved understanding of pixel representation and suggests that human perception is more adept at detecting changes than static intensities. Ultimately, the research emphasizes the importance of recognizing how image attributes relate to perceivable changes.
E N D
CS 395/495-25: Spring 2004 IBMR:Poisson SolversCan Reconstruct Images from their Changes Jack Tumblin jet@cs.northwestern.edu
Do pixels describe what we see? What We Want What We Get
What do you see? A B What part has constant intensity?
What do you see? Humans don’t sense intensities reliably, but infer them from changes A B B intensity is constant, A is darker on right
What do you see? Humans don’t sense intensities reliably, but infer them from changes A B (tol’djah!)
What do you see? X Y What part has constant intensity?
What do you see? Humans don’t sense intensities reliably, but infer them from changes X Y What part has constant intensity? NEITHER!
What do you see? Humans don’t sense intensities reliably, but infer them from changes X Y Constant What part has constant intensity? NEITHER!
What do you see? Humans don’t sense intensities reliably, but infer them from changes Example: aren’t all the dots white? (http://udel.edu/~jgephart/fun2.htm)
Why Pixels Could be Improved: • People see (or think they see) changesfinite features that may have infinite bandwidth occlusion, depth, collision time, trajectory changes, corner, cone tip, boundaries, edges, occlusions, shadow details, contact points, velocity & direction changes... Pixels only approximate changes, and approximate discontinuous changes poorly; object boundaries, silhouettes, etc. They force indirect estimation...
How? Retinal Receptive Fields… - - - - - + - - - + + + + + + + + - • 130M Photoreceptors1M optic nerve fibers • Center-Surround Antagonism:Out Center - (avg surround) • Complementary ON-center, OFF-center types • Center responds quickly; Surround responds more slowly • Output: ‘recent local change’
Complementary Receptive Fields - - - - - + - - - Firing Rate (Hz) + 100 + + + + + 50 + + 10 ctr/surr - 10 0.1 1 ctr/surr 10 50 Firing Rate (Hz) 100 • Retina is ~differential for small signals • Better SNR • Can signal ambiguity (eyes closed, etc) • Allows quality/fault detection
Yarbus (1950s): Pioneer of Retinal Stabilization Experiments (inspired a flood of others…)
Strongly Implies ‘Filling In’ requires Nystagmus for temporal transients... ‘mm, nothing much. (green-ish?)’ ‘Not much to see.(pink-ish?)’ ‘BUT HERE is a Big ring of VERY strong change!’
‘mm, Not much to see. (green-ish?)’ ‘Not much to see.(pink-ish?)’
What ‘Changes’ do we Sense? • Intensity (luminance) vs. local position • Color (chrominance) vs. local position • Intensity vs. time (‘flicker’) • Color vs. time • VERY weak, low-res: overall intensity • Inertial changes: movement, velocity… Compensated eye moves (saccade, glissade, smooth-pursuit… • Higher-level attributes? Umm, er, uh,….
‘Digital’ Image: a 2D Grid of Numbers • NO intrinsic meaning—use it for anything: reflectance, transparency, illumination, normal direction, material, velocity. BUT usually ‘intensity’ y y x x
2D Images Described by Change? • Image intensity as height field f(x,y): • 1st derivative— Gradient: the ‘uphill’ vector at point x,y = f(x,y) = (f(x,y)/x, f(x,y)/y) = f y x f(x,y)
2D Images Described by Change? • Image intensity as height field f(x,y): • 2st derivative— Gradient: the ‘uphill’ vector at point x,y = f(x,y) = (f(x,y)/x, f(x,y)/y) = f y x f(x,y)
Review: Div, Grad and Curl • Formalized, computable ‘Local Change’