1 / 17

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies. Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford University UAI 2008. Presented by Haojun Chen August 1 st , 2008. Outline. Background and motivation Undirected transfer hierarchies

yaholo
Télécharger la présentation

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford University UAI 2008 Presented by Haojun Chen August 1st, 2008

  2. Outline • Background and motivation • Undirected transfer hierarchies • Experiments • Degree of transfer coefficients • Experiments • Summary

  3. Background (1/2) • Transfer learning Data from “similar” tasks/distributions are used to compensate for the sparsity of training data in primary class or task Example: Use rhinos to help learn elephants’ shape Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  4. : a set of related learning tasks/classes : observed data : task/class parameters Background (2/2) • Hierarchical Bayes (HB) framework Principled approach for transfer learning Joint distribution over the observed data and all class parameters as follows: where Example of a hierarchical Bayes parameterization Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  5. Motivation • In practice, point estimation of the MAP is desirable, for full Bayesian computations can be difficult and computationally demanding • Efficient point estimation may not be achieved in many standard hierarchical Bayes models, because many common conjugate priors such as the Dirichlet or normal-inverse-Wishart are not convex with respect to the parameters • In this paper, an undirected hierarchical Bayes(HB) reformulation is proposed to allow efficient point estimation

  6. Undirected HB Reformulation : data-dependent objective : divergence function over child and parent parameters → 0 : encourages parameters to explain data →∞ : encourages parameters to be similar to parents Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  7. Purpose of Reformulation • Easy to specify • Fdata can be likelihood, classification, or other objective • Divergence can be L1-norm, L2-norm, e-insensitive loss, KL divergence, etc. • No conjugacy or proper prior restrictions • Easy to optimize • Convex over Q if Fdata is concave and Divergence is convex Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  8. Bag-of-words model Fdata : Multinomial log likelihood (regularized) : frequency of word i Divergence: L2 norm Experiment: Text categorization Newsgroup20 Dataset Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  9. Text categorization Result Baseline: Maximum likelihood at each node (no hierarchy) Cross-validate regularization (no hierarchy) Shrinkage (McCallum et al. ’98, with hierarchy) Newsgroup Topic Classification 0.7 0.65 0.6 0.55 Classification Rate 0.5 0.45 Max Likelihood (No regularization) Shrinkage Regularized Max Likelihood 0.4 Undirected HB 0.35 75 150 225 300 375 Total Number of Training Instances Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  10. (Density estimation – test likelihood) Instances represented by 60 x-y coordinates of landmarks on outline Divergence: L2 norm over mean and variance Experiment: Shape Modeling Mammals Dataset (Fink, ’05) Covariance over landmarks Mean landmark location Regularization Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  11. Undirect HB Shape Modeling Result Mammal Pairs 50 Regularized Max Likelihood 0 -50 Elephant-Rhino -100 Delta log-loss / instance -150 Bison-Rhino Elephant-Bison -200 Elephant-Rhino Giraffe-Bison Giraffe-Elephant -250 Giraffe-Rhino Llama-Bison Llama-Elephant -300 Llama-Giraffe Llama-Rhino -350 6 10 20 30 Total Number of Training Instances Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  12. Problem in Transfer Not all parameters deserve equal sharing Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  13. Degrees of Transfer (DOT) is split into subcomponentswith weights , and hence different strengths are allowed for different subcomponents, child-parent pairs → 0 : forces parameters to agree →∞ : allows parameters to be flexible Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  14. Estimation of DOT Parameters Hyper-prior approach Bayesian idea:Put prior on and add as parameter to optimization along with Concretely: inverse-Gamma prior (forced to be positive) Prior on Degree of Transfer Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  15. DOT Shape Modeling Result Mammal Pairs 15 Hyperprior 10 Elephant-Rhino 5 Delta log-loss / instance 0 Regularized Max Likelihood Bison-Rhino -5 Elephant-Bison Elephant-Rhino Giraffe-Bison Giraffe-Elephant Giraffe-Rhino -10 Llama-Bison Llama-Elephant Llama-Giraffe Llama-Rhino -15 6 10 20 30 Total Number of Training Instances Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  16. Distribution of DOT coefficients Distribution of DOT coefficients using Hyperprior approach 20 18 qroot 16 14 12 10 8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 1/l Weaker transfer Stronger transfer Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

  17. Summary • Undirected reformulation of the hierarchical Bayes framework is proposed for efficient convex point estimation • Different degrees of transfer for different parameters are introduced so that some parts of the distribution can be transferred to a greater extent than others

More Related