1 / 28

Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification. Authors: Sanjiv Kumar and Martial Hebert Slides prepared by Chihoon Lee. Before we start. We need to Know the fundamental understandings of Probability and Statistics theory

graham
Télécharger la présentation

Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification Authors: Sanjiv Kumar and Martial Hebert Slides prepared by Chihoon Lee Chihoon Lee

  2. Before we start • We need to • Know the fundamental understandings of Probability and Statistics theory • Know elementary Linear algebra • Be Familiar with the concepts of Graphical Models • Distinguish the Discriminative models from Generative models The brief notes for each item will be opened soon Chihoon Lee

  3. Discriminative Random Fields • Problem • Classification of random variables by incorporating neighborhood interactions in the labels as well as observed data • Advantages • Allow to relax the strong assumption of conditional independence of observed data, which is in general adopted in MRF framework for tractability • Derive their classifications power by exploiting the probabilistic discriminative models • All the parameters in the DRF model are estimated simultaneously from training data Chihoon Lee

  4. DRFs • Introduction • Undirected Graphical Model (MRFs) • DRFs • Representation • Local Function • Association Potentials • Interaction Potential • Parameter Estimation • Inference • Experiments • Conclusion Chihoon Lee

  5. Introduction • MRFs are generally used in a probabilistic generative framework that models the joint probability of the observed data and corresponding labels X={Xi}iS, Xi is the data from the ith site and S is the set of sites. Thus, an image is represented with {x1,x2,…,xn}, where n=|S|. Y= ={Yi}iS, and Yi is the corresponding label at the image site i. YiC, where C is the set of labels Chihoon Lee

  6. Introduction • In the MRFs Framework, the posterior over the labels given observations is expressed as, • P(Y|X) = P(X,Y)/P(X) P(X,Y) =P(Y)P(X|Y) , where the prior over labels, P(Y) is modeled. • P(X|Y) is a factorized form in the likelihood model. i.e.) P(X|Y)=iSP(Xi|Yi)  this is too restrictive in classification problem Chihoon Lee

  7. Introduction • For Classification • Estimate the posterior (eg. P(Y|X)) • In Generative frame work (MRFs), • P(Y|X) = P(X|Y)P(Y)/P(X) • So need to implicitly model the observation • In Discriminative frame work, • Directly model P(Y|X) from data. Chihoon Lee

  8. Quick Peek into DRFs • Based on the concept of Conditional Random Fields, where CRFs directly models P(Y|X) • Allow to capture arbitrary dependencies between observation. • CRFs + local discriminative models  To Capture the class associations at individual sites as well as the interactions with the neighboring sites on 2-D Chihoon Lee

  9. Y1 Y2 YN Y1 Y2 YN X1 X2 XN X MRFs and DRFs Markov Random Fields Discriminative Random Fields Chihoon Lee

  10. DRFs • X: Observation • Y: Labels, Yi{1,-1} • P(Y|X) is modeled directly from data without modeling the prior P(Y). • Marginal P(X) is not explicitly modeled.  Joint Distribution • CRF formal Definition Chihoon Lee

  11. DRFs • Def. of CRFs • Let G=(S,E) be a graph such that Y is indexed by the vertices of G. Then (X,Y) is said to be a conditional random field if, when conditioned on X, the random variables Yi obey the Markov Property with respect to the graph: P (Yi| X,YS-{i})=P(Yi|X,YNi), where S-{i} is the set of all nodes in the graph except the node i, Ni is the set of neighbors of the node i in the G, and Y represents the set of labels at the nodes in the set  Chihoon Lee

  12. DRFs • Thus, a CRF is a random field globally conditioned on the observation X. • P(Y|X)>0 • By showing the marginal is not explicitly modeled, Joint distribution as, , where Z is a normalization constant known as partition function, and Ai and Iij are the unary and pair-wise potentials Chihoon Lee

  13. DRFs • Association Potentials • A(Yi,X) is modeled using a local discriminative model that outputs the association of the site i with class Yi • For each site i, fi: XRl, where fi(X) is a function that maps the observations X on a feature vector. Using the logistic function, the local class posterior can be modeled as, Eq.1 Chihoon Lee

  14. DRFs • Where w=(w0,w1) are the model parameters. • To extend the logistic model to induce a nonlinear decision boundary in the feature space, a transformed feature vector at each site i is defined as, , where k(·) is an arbitrary nonlinear function Chihoon Lee

  15. DRFs • Eq. 1 can be rephrased as, • Finally, the association potentials is defined as, • Discuss Difference from the MRFs framework • In MRFs, allow one to use the data only from a particular site, i.e. Xi to get the log likelihood Chihoon Lee

  16. DRFs • Interaction Potential • In MRFs, Interaction Potential is given as I=βYiYj, which penalizes every dissimilar pair of labels by the cost of β. • In DRF, IP is a function of all the observations X. • P(Yi=Yj|i(X), j(X))=P(tij|i(X), j(X)), where tij is 1 if Yi=Yj, otherwise -1. • where k(X) is a function to map x on a feature vector. i.e.) k(X)≠fk(X), Chihoon Lee

  17. DRFs • Using a feature function denoted by uij(X), the pairwise discriminatory term defined as, P(tij|i(x), j(x))=(tijTuij(X)), where  are the model parameters. • Interaction Potential in DRFs is modeled as a convex combination of two terms, I(Yi,Yj,X) = β{KYiYj+(1-K)(2 (tijTuij(X))-1}, where 0 K  1. First term – data independent smoothing term Second term – data dependent term acted as a discontinuity adaptive model that moderates the smoothing when data from two sites is different. Chihoon Lee

  18. DRFs • β is the interaction coefficient that controls the degree of smoothing. i.e.) Large value of β produces more smooth solutions. • Now we need to estimate parameters for the models we have defined so far Chihoon Lee

  19. DRFs • Estimation of parameters • ={w, β,,K} • Standard Maximum-Likelihood (pseudo likelihood) , where M is the total number of training examples Chihoon Lee

  20. DRFs Initialization of w (v) is learned using standard maximum likelihood logistic regression, assuming all the labels Ymi to be independent given the observation Xm Chihoon Lee

  21. DRFs • Inference • Given a new test data X, the goal is to find optimal label configuration Y. • Maximum Posterior Solution is widely used estimate that is optimal with respect to the zero-one cost (indicator) function defined as, C(Y,Y*)=1-(Y-Y*), where Y* is the true label configuration, and (Y-Y*) is 1 if Y=Y*, and 0 otherwise. Chihoon Lee

  22. DRFs • An alternative to MAP • Maximum Posterior Marginal, where the cost function is defined as C(Y,Y*)=iS(1-(Yi-Yi*)) • Iterated Conditional Modes (ICM) • Given an initial configuration, ICM maximizes the local conditional probabilities iteratively, i.e. YiargmaxyiP(Yi|YNi,X) • Local Maxima Chihoon Lee

  23. Experiments • The proposed DRFs applied to the detection of man made structures in natural scenes • Training sets: 108 images (256 by 384) • Testing sets: 129 images • Training set contains 36,269 blocks from the non-structured class and 3004 blocks from the structured class Chihoon Lee

  24. Experiments • Feature Description • Histogram • Orientation based feature • Single-site feature (si(Xi)) • 3 moments and 2 orientation based features at site i • No consideration of correlation of neighborhood • Multi scale feature (fi(X)) • Explicitly describes the dependencies in observation X Chihoon Lee

  25. Experiments • fi(X) – Feature space • 14 dimensions • k(fi(X)) – Transformed feature space • 14+14(14+1)/2=119 dimensions • Equivalent to the kernel mapping of data using a polynomial kernel of degree two • Compare the result with MRF Chihoon Lee

  26. Experiments • βm is the interaction parameter of the MRF. • Class conditional density was a modeled as a mixture of Gaussian. • Performance Evaluation • DRFs outperforms • MRFs • Logistic Chihoon Lee

  27. Conclusion • Introduced a discriminative random fields to model conditional distribution of class label without modeling class density distribution. • Explicitly modeled data dependency in neighbors • Initialization of parameters is hard problem due to the local optima (pseudo likelihood) • Extension to Multi class • Ways to apply for the Tumor Segmentation/ even for Tumor Segmentation Chihoon Lee

  28. Reference • Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification. Sanjiv Kumar and Martial Hebert • Any comments will be appreicated chihoon@cs Chihoon Lee

More Related