Conditional Random Fields for Image Labeling

Conditional Random Fields for Image Labeling Yilin Wang 11/5/2009

Background • Labeling Problem • Labeling: Observed data set (X) Label set (L) • Inferring the labels of the data points • Most vision problems can be posed as labeling problems • Stereo matching • Image segmentation • Image restoration

Examples of Labeling Problem • Stereo Matching • For a pixel in image 1, where is the corresponding pixel in image 2? Label set: Differences (disparities) between corresponding pixels Picture source: S. Lazebnik

Examples of Labeling Problem • Image Segmentation • To partition an image into multiple disjoint regions. Label set: Region IDs Picture source: http://mmlab.ie.cuhk.edu.hk

Examples of Labeling Problem • Image Restoration • To "compensate for" or "undo" defects which degrade an image. Label set: Restored Intensities Picture source: http://www.photorestoration.co.nz

Background • Image Labeling • Given an image, the system should automatically partition it into semantically meaningful areas each labeled with a specific object class Cow Sky Lawn Building Tree Plane

Image Labeling Problem • Given • : the observed data from an input image , where is the data from site (pixel or block) of the image set S • A pre-defined label set • Let be the corresponding labels at the image sites, we want to find proper L to maximize the conditional probability :

Features for Image Labeling • Features from individual sites • Intensity, color, texture, … Which kinds of information can be used for labeling? Vegetation • Interactions with neighboring sites • Contextual information Sky or Building?

Contextual Information • Interaction with neighboring labels (Spatial smoothness of labels) • neighboring sites tend to have similar labels(except at the discontinuities) Two types of interactions Sky • Interactions with neighboring observed data Sky Building

Information for Image Labeling • Let be the label of the site of the image set S, and Nibe the neighboring sites of site i • Three kinds of information for image labeling • Features from local site • Interaction with neighboring labels • Interaction with neighboring observed data site i S-{i} Ni Picture source: S. Xiang

Markov Random Fields (MRF) • Markov Random Fields (MRFs) are the most popular models to incorporate local contextual constraints in labeling problems • Let be the label of the site of the image set S, and Nibe the neighboring sites of site i The label set L ( ) is said to be a MRF on Sw.r.t. a neighborhood Niff the following condition is satisfied: • Markov property: Maintain global spatial consistency by only considering relatively local dependencies !

Markov-Gibbs Equivalence • Let l be a realization of , then P(l) has an explicit formulation (Gibbs distribution): where Energy function Z: a normalizing factor, called the partition function T: a constant Clique Ck={{i,i’,i’’,…}|i,i’,i’’,… are neighbors to one another} Potential functions represent a priori knowledge of interactions between labels of neighboring sites

Auto-Model • With clique potentials up to two sites, the energy takes the form • When and , whereGi(·) are arbitrary functions (or constants) and are constants reflecting the pairwise interaction between iand i’, the energy is Such models are called auto-models (Besag 1974)

Parameter Estimation • Give the functional form of the auto-model How to specify its parameters ?

Maximum Likelihood Estimation • Given a realization l of a MRF, the maximum likelihood (ML) estimate maximizes the conditional probability P(l | θ) (the likelihood of θ), that is: • By Bayesian rules: • The prior P(θ) is assumed to be flat when the prior information is totally unavailable. In this case, the MAP estimation reduces to the ML estimation.

Maximum Likelihood Estimation • The likelihood function is in the Gibbs form where • However, the computation of Z(θ) is intractable even for moderately sized problems because there are a combinatorial number of elements in the configuration space L.

Maximum Pseudo-Likelihood • Assumption: and are independent. • Notice that the pseudo-likelihood does not involve the partition function Z. • {α, β}can be obtained by solving

Inference • Recall that in image labeling, we want to find L such that maximizes the posterior , by Bayesian rules: Where prior probability: • Let and then: posterior energy prior energy likelihood energy

MAP-MRF Labeling • Maximizing a posterior probability is equivalent to minimizing the posterior energy: • Steps of MAP: N C Picture source: S. Xiang

MRF for Image Labeling • Difficulties and disadvantages • Very strict independence assumptions : • The interactions among label are modeled by the priori term (P(L)), and are independent of the observation data, which prohibits one from modeling data-dependent interactions in labels.

Conditional Random Fields • Let G = (S, E) be a graph, then (X, L) is said to be a Conditional Random Field (CRF) if, when conditioned on X, the random variables obey the Markov property with respect to the graph: where S-{i} is the set of all sites in the graph except the site i, Ni is the set of neighbors of the site i in G. Compare with MRF:

CRF • According to the Markov-Gibbs equivalence, we have • If only up to pairwise clique potentials are nonzero, the posterior probability P(L| X) has the form where −V1 and −V2 are called the association and interaction potentials, respectively, in the CRF literature

CRF vs. MRF • MRF is a generative model(Two-step) • Infer likelihood and prior • Then use Bayes theorem to determine posterior • CRF is a discriminative model(One-step) • Directly infer posterior

CRF vs. MRF • More differences between the CRF and MRF • MRF: • CRF: • In CRF, both Association and Interaction Potentials are functions of all the observation data as well as that of the labels

Discriminative Random Fields • The Discriminative Random Field (DRF) is a special type of CRF with two extensions. • First, a DRF is defined over 2D lattices (such as the image grid) • Second, the unary (association) and pairwise (interaction) potentials therein are designed using local discriminative classifiers Kumar, S. and M. Hebert: `Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification'. ICCV 2003

DRF • Formulation of DRF where and are called association potential and interaction potential Picture source: S. Xiang

Association Potential • is modeled using a local discriminative model that outputs the association of the site i with class lias: where fi(.) is a linear function that maps an patch centered at site i to a feature vector. Picture source: S. Srihari

Association Potential • For binary classification (li = -1 or 1), the posterior at site i is modeled using a logistic function: • Since li = -1 or 1, the probability can be compactly expressed as: • Finally, the association potential is defined as: Picture source: S. Srihari

Interaction Potential • The interaction potential can be seen as a measure of how the labels at neighboring sites iand i' should interact given the observed image X. • Given the features at two different sites, a pairwise discriminative model is defined as: where is a function that maps an patch centered at site i to a feature vector, is a new feature vector, and v are model parameters is a measure of how likely site i and i’ have the same label given image X

Interaction Potential • The interaction potential is modeled using data-dependent term along with constant smoothing term • The first term is a data-independent smoothing term, similar to the auto-model • The second term is a [-1, 1] mapping of the pairwise logistic function , which ensures that both terms have the same range • Ideally, the data-dependent term will act as a discontinuity adaptive model that will moderate smoothing when the data from two sites is 'different'.

Discussion of I(li,li’,X) Suppose , , and Also for simplicity, we assume Then • If only considering , will never choose b. • The second term is used to compensate the effect of the smoothness assumption. Oversmoothed!

Parameter Estimation θ={w,v,β,K} • Maximum likelihood estimation • In the conventional maximum-likelihood approach, the evaluation of Z is an NP-hard problem. • Approximate evaluation of partition function Z by pseudo-likelihood where m indexes over the training images and M is the total number of training images, and Subject to 0≤K≤1

Inference • Objective function: • Iterated Conditional Modes (ICM) algorithm • Given an initial label configuration, ICM maximizes the local conditional probabilities iteratively, i.e. • ICM yields local maximum of the posterior and has been shown to give reasonably good results

Experiment • Task: detecting man-made structures in natural scenes • Database • Corel (training: 108 images, test: 129 images) • Each image was divided in non-overlapping 16*16 pixels blocks • Compared methods • Logistic • MRF • DRF

Experiment Results • Detection Rates (DR) and False Positives (FP) The DRF reduces false positives from the MRF by more than 48%. Superscript ‘-’ indicates no neighborhood data interaction was used. K = 0 indicates the absence of the data independent term in the interaction potential in DRF.

Experiment Results For similar detection rate, DRF has the lower false positives Detection rate of DRF is higher than that of MRF for similar false positives

Conclusion of DRF • Pros • Provide the benefits of discriminative models • Demonstrate good performance • Cons • Although the model outperforms traditional MRFs, it is not strong enough to capture long range correlations among the labels due to the rigid lattice based structure which allows for only pairwise interactions

Problem • Local information can be confused when there are large overlaps between different classes Sky or Water ? Solution: utilizing the global contextual information to improve the performance

Multiscale Conditional Random Field (mCRF) • Considering features in different scales • Local Features (site) • Regional Label Features (small patch) • Global Label Features (big patch or the whole image) • The conditional probability P(L|X) is formulated by features in different scales s: where He, X., R. Zemel, and M. Carreira-Perpinan: 2004, `Multiscale conditional random fields for image labelling'. IEEE Int. Conf. CVPR.

Local Features • The local feature of site i is represented by the outputs of several filters. • The aim is to associate the patch with one of a predefined set of labels.

Local Classifier • Here a multilayer perceptron is used as the local classifier. • Independently at each site i, the local classifier produces a conditional distribution over label variable li given filter outputs xi within an image patch centered on site (pixel) i: where λ are the classifier parameters.

Regional Label Features • Encoding a particular constraint between the image and the labels within a region of the image • Sample pattern: ground pixels (brown) above water pixels (cyan)

Global Label Features • Operate at a coarser resolution, specifying common value for a patch of sites in the label field. • Sample pattern: sky pixels (blue) at the top of the image, hippo pixels (red) in the middle, and water pixels (cyan) near the bottom.

Feature Function • Global label features are trained by Restricted Boltzmann Machines (RBM) • two layers: label sites (L) and features (f) • features and labels are fully inter-connected, with no intra-layer connections The joint distribution of the global label feature model is: where wais the parameter connecting hidden global label feature fa and label sites L

Feature Function • By marginalizing out the hidden variables (f), the global component of the model is: • Similarly, the regional component of the model can be represented as: • By multiplicatively combining component conditional distributions:

Parameter Estimation and Inference • Parameter Estimation • The conditional model is trained discriminatively based on the Conditional Maximum Likelihood (CML) criterion, which maximizes the log conditional likelihood: • Inference • Maximum Posterior Marginals (MPM):

Experiment Results • Database • Corel (100 images with 7 labels) • Sowerby (104 images with 8 labels) • Compared methods • Single classifier (MLP) • MRF • mCRF

Labeling Results

Conclusion of mCRF • Pros • Formulating the image labeling problem into a multiscale CRF model • Combining the local and larger scale contextual information in a unique framework • Cons • Including additional classifiers operating at different scales into the mCRF framework introduces a large number of model parameters • The model assumes conditional independence of hidden variables given the label field

More CRF models • Hierarchical Conditional Random Field (HCRF) • –S. Kumar and M. Hebert. A hierarchical field framework for unified context-based classification. 2005 • –Jordan Reynolds and Kevin Murphy. Figure-ground segmentation using a hierarchical conditional random field. 2007 • Tree Structured Conditional Random Fields (TCRF) • –P. Awasthi, A. Gagrani, and B. Ravindran, Image Modeling using Tree Structured Conditional Random Fields. 2007

Conditional Random Fields for Image Labeling