Understanding Gestalt Cues and Ecological Statistics using a Database of Human Segmented Images

Understanding Gestalt Cues and Ecological Statistics using a Database of Human Segmented Images Charless Fowlkes, David Martin and Jitendra Malik Department of Computer Sciences, University of California, Berkeley {fowlkes,dmartin,malik}@eecs.berkeley.edu Egon Brunswik first suggested nearly 50 years ago that the various Gestalt factors of grouping made sense because they reflected the statistics of natural scenes[1]. For example, if points that are similar in color or luminosity are more likely to belong to the same object then it is appropriate to group them. We looked at two measures for quantifying the relative “power” of different segmentation cues. If we consider the classification task of deciding whether two pixels belong in the same or different segments, we can compute the Bayes risk associated with the optimal threshold for a given cue. Unfortunately, risk is uninformative when the Bayes optimal strategy is to declare all points as lying in different segments. We are building a collection of human generated segmentations of natural images that is useful in quantifying the nature of cues such as similarity, proximity, and convexity. Having ground truth allows us to empirically observe the probability that two points in the image plane should belong to the same segment conditioned on some photometric property of the image such as the similarity in local intensity. We also compute the mutual information between the same segment indicator and a given cue. Here we show the risk and mutual information conditioned on distance between the two points. Two points that are next to each other in the image plane are almost always members of the same segment but extended regions such as sky mean that the distribution has a heavy tail. We operationalize convexity as the ratio between a segment’s area and the area of it’s convex hull. Convex regions are far more prevalent since images tend to consist of several convex foreground objects in front of a background segment. Since natural images contain texture and shading, identical luminance does not imply segment membership. [1] E. Brunswik, J. Kamiya, “Ecological validity of proximity and other Gestalt factors,” American Journal of Psychology, pp. 20-32, 1953 [2] D. Mumford, B. Gidas, “Stochastic Models for Generic Images,” Technical Report, Division of Applied Mathematics, Brown University, 1998. [3] D. L. Ruderman, “The Statistics of Natural Images,” Network, 5(4):517-548, 1994. [4] L. Alvarez, Y. Gousseau, J. Morel, “Scales in Natural Images and a Consequence on Their BV Norm”, Scale-Space Theories in Computer Vision, 1999. [5] W. Geisler, J. Perry, B. Super, D. Gallogly, “Edge Co-occurrence in Natural Images Predicts Contour Grouping Performance,” Vision Research, 41, 711-724, 2001. [6] J. August, S. Zucker, “The Curve Indicator Random Field: Curve Organization via Edge Correlation”, 265-287, in Perceptual Organization in Artificial Vision Systems, Boyer and Sarkar (eds.), Kluwer, 2000 [7] S. C. Zhu, “Embedding Gestalt Laws in Markov Random Fields,” IEEE Trans. On Pattern Analysis and Machine Intelligence, 21(11), Nov 1999. Probability of lying in the same segment conditioned on both image plane and color space distance. Due to the self-similar nature of images, we expect the distribution of region sizes to follow a power law over some range of scales. Our result here agrees with previous work in the area. Segmentations of color images tend to contain more small segments.

Understanding Gestalt Cues and Ecological Statistics using a Database of Human Segmented Images

Understanding Gestalt Cues and Ecological Statistics using a Database of Human Segmented Images

Presentation Transcript

Using a database

Statistics of natural images

Understanding Your Baby’s Cues

A Survey of Human Ecological Stupidity

General Database Statistics Using Maximum Entropy

Gestalt

Database Statistics

Database Statistics

A Database of Human Segmented Natural Images and Two Applications

Classifying Images with Visual/Textual Cues

Using Images

Using Images

Multi-camera Tracking of Articulated Human Motion using Motion and Shape Cues

Database Statistics

Using Images from a CD

Using images

Understanding emotion from auditory cues: prosody

Dicom images database

USING IMAGES

Comparison of Human and M-rep Kidneys Segmented from CT Images