380 likes | 455 Vues
Human-Computer Interaction Segmentation. Hanyang University Jong-Il Park. Why segment images?. Form large chunks of pixels that can be dealt with together for efficiency because these might represent objects Join up image tokens that together convey information. Grouping.
E N D
Human-Computer InteractionSegmentation Hanyang University Jong-Il Park
Why segment images? • Form large chunks of pixels that can be dealt with together • for efficiency • because these might represent objects • Join up image tokens that together convey information
Grouping • Humans interpret image information collectively • in “groups” • Eg. Muller-Lyer illusion
Applications • Shot boundary detection • summarize video by • find shot boundaries • obtain “most representative” frame • Background subtraction • find “interesting bits” of image by subtracting known background • Eg. find person in an office • Eg. find cars on a road • Interactive segmentation • user marks some foreground/background pixels • system cuts object out of image • useful for image editing, etc.
Technique: Shot Boundary Detection • Find the shots in a sequence of video • shot boundaries cause big differences between succeeding frames • Strategy: • compute interframe distances • declare a boundary where these are big • Possible distances • frame differences • histogram differences • block comparisons • edge differences
Technique: Background Subtraction • If we know the background, easy to find “interesting bits” • Approach: • use a moving average to estimate background image • subtract from current frame • large absolute values are interesting pixels • trick: use morphological operations to clean up pixels
Interactive segmentation • Goals • User cuts an object out of one image to paste into another • User forms a matte • weights between 0 and 1 to mix pixels with background • to cope with, say, hair • Interactions • mark some foreground, background pixels with strokes • put a box around foreground • Technical problem • allocate pixels to foreground/background class • consistent with interaction information • segments are internally coherent and different from one another
Superpixels • Pixels are too small and too detailed a representation • for recognition • for some kinds of reconstruction • Replace with superpixels • small groups of pixels that are • clumpy • like one another • a reasonable representation of the underlying pixels
Segmentation as clustering • Cluster together (pixels, tokens, etc.) that belong together • Agglomerative clustering • attach closest to cluster it is closest to • repeat • Divisive clustering • split cluster along best boundary • repeat
The watershed algorithm • An agglomerative clusterer with a special metric
Clustering pixels • Natural to use k-means • represent pixels with • intensity vector; color vector; vector of nearby filter responses • perhaps position
The Mean Shift Algorithm • Originally intended to find modes in scattered data • Strategy • start at a promising estimate of mode • iterate until the estimate doesn’t change • fit a model of probability density to some points near estimate • find the peak of this model • Model • smoothing kernel • the update takes a special form • shift the mode to a weighted mean of the nearby points • hence the name.
Clustering with Mean Shift • Model data points as samples from a probability model • clusters are associated with modes • but it might be hard to find one mode per cluster • if there’s more than one mode per cluster, they should be close together • Apply mean shift to find modes • modes should form small, widely separated clusters • Now cluster the modes with (say) agglomerative clusterer • easy, because there are small, widely separated clusters • Point belongs to cluster that its closest mode belongs to
Mean Shift Segmentation • Cluster pixels using mean shift • each cluster is a segment • Represent with color, position • important • color distances are not the same as position distances • choose one scale for each
Evaluating Segmenters • Collect “correct” segmentations • from human labellers • these may not be perfect, but ... • Now apply your segmenter • Count • % human boundary pixels close to your boundary pixels -- Recall • % of your boundary pixels close to human boundary pixels -- Precision