410 likes | 527 Vues
This document outlines the EE512 course on Graphical Models taught by Jeff A. Bilmes at the University of Washington in Spring 2006. It includes lecture dates, reading assignments, and important milestones regarding final projects. Key topics covered include Gaussian Graphical Models, inference methods, and structure learning. Students are reminded of office hours and discussion sections, and encouraged to finalize project topics as homework is not assigned this quarter. This resource serves as a roadmap for students navigating the course.
E N D
University of WashingtonDepartment of Electrical Engineering EE512 Spring, 2006 Graphical ModelsJeff A. Bilmes <bilmes@ee.washington.edu> Lecture 17 Slides May 30th, 2006 EE512 - Graphical Models - J. Bilmes
Announcements • READING: • M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman) • Reminder: TA discussions and office hours: • Office hours: Thursdays 3:30-4:30, Sieg Ground Floor Tutorial Center • Discussion Sections: Fridays 9:30-10:30, Sieg Ground Floor Tutorial Center Lecture Room • No more homework this quarter, concentrate on final projects!! • Makeup class, tomorrow Wednesday, 5-7pm, room TBA (watch email). EE512 - Graphical Models - J. Bilmes
Class Road Map • L1: Tues, 3/28: Overview, GMs, Intro BNs. • L2: Thur, 3/30: semantics of BNs + UGMs • L3: Tues, 4/4: elimination, probs, chordal I • L4: Thur, 4/6: chrdal, sep, decomp, elim • L5: Tue, 4/11: chdl/elim, mcs, triang, ci props. • L6: Thur, 4/13: MST,CI axioms, Markov prps. • L7: Tues, 4/18: Mobius, HC-thm, (F)=(G) • L8: Thur, 4/20: phylogenetic trees, HMMs • L9: Tue, 4/25: HMMs, inference on trees • L10: Thur, 4/27: Inference on trees, start poly • L11: Tues, 5/2: polytrees, start JT inference • L12: Thur, 5/4: Inference in JTs • Tues, 5/9: away • Thur, 5/11: away • L13: Tue, 5/16: JT, GDL, Shenoy-Schafer • L14: Thur, 5/18: GDL, Search, Gaussians I • L15: Mon, 5/22: laptop crash • L16: Tues, 5/23: search, Gaussians I • L17: Thur, 5/25: Gaussians • Mon, 5/29: Holiday • L18: Tue, 5/30 • L19: Thur, 6/1: final presentations EE512 - Graphical Models - J. Bilmes
Final Project Milestone Due Dates • L1: Tues, 3/28: • L2: Thur, 3/30: • L3: Tues, 4/4: • L4: Thur, 4/6: • L5: Tue, 4/11: • L6: Thur, 4/13: • L7: Tues, 4/18: • L8: Thur, 4/20: Team Lists, short abstracts I • L9: Tue, 4/25: • L10: Thur, 4/27: short abstracts II • L11: Tues, 5/2: • L12: Thur, 5/4: abstract II + progress • L--: Tues, 5/9 • L--: Thur, 5/11: 1 page progress report • L13: Tue, 5/16: • L14: Thur, 5/18: 1 page progress report • L15: Tues, 5/23 • L16: Thur, 5/25: 1 page progress report • L17: Tue, 5/30: Today • L18: Wed, 5/31: • L19: Thur, 6/1: final presentations • L20: Tue, 6/6 4-page papers due (like a conference paper), Only .pdf versions accepted. • Team lists, abstracts, and progress reports must be turned in, in class and using paper (dead tree versions only). • Final reports must be turned in electronically in PDF (no other formats accepted). • No need to repeat what was on previous progress reports/abstracts, I have those available to refer to. • Progress reports must report who did what so far!! EE512 - Graphical Models - J. Bilmes
Summary of Last Time • Gaussian Graphical Models EE512 - Graphical Models - J. Bilmes
Outline of Today’s Lecture • Other forms of inference. • Structure learning in graphical models EE512 - Graphical Models - J. Bilmes
Books and Sources for Today • Jordan chapters 13-15 • Other references contained in presentation … EE512 - Graphical Models - J. Bilmes
Graphical Models • We start with some probability distribution P • Could be specified as a given, or more likely we have training data of some number of samples. Goal is to learn P or some approximation to it (training) and then use P in some way (inference for making decisions, such as most probable assignment, max-product semi-ring, etc.) • The graph G=(V,E) represents “structure” in P • Graph can provide efficient representation and computational inference for P • There can be multiple graphs that represent a given P (e.g., complete graph represents all P). • Goal: find computationally cheap exact or approximate graph cover for P • Once we do this, we just compute probabilities using the junction tree algorithm or search algorithm, etc. EE512 - Graphical Models - J. Bilmes
Graphical Models & Tree-width • The complexity parameter for G=(V,E) • Def: k-tree: k-nodes, clique of size k. n>k nodes, connect nth node to previous k fully connected nodes • Example: 4-tree note: all separators are of size 4 4-tree with 4 nodes 4-tree with 5 nodes 4-tree with 6 nodes EE512 - Graphical Models - J. Bilmes
Graphical Models & Tree-width • Def: partial k-tree: any sub-graph of a k-tree • Def: tree-width of a graph G is smallest k such that G is a partial k-tree. • Thm: The tree-width decision problem is NP-complete • We mentioned this before, proven by Arnborg, • Thm: exact probabilistic inference (computing probabilities, etc.) is exponential in the tree-width • Time-space tradeoffs can help here, but what if all of the points in the achievable region are intolerably computationally expensive? • The big question, what if exact inference is too expensive? EE512 - Graphical Models - J. Bilmes
When exact inference is too expensive • Two general approaches: either an exact solution to an approximate problem, or an approximate solution to an exact problem. • Exact solution to approximate problem • Structure learning: find a low tree-width (or “cheap” in some way) graphical model that is still “high-quality” in some way, and then perform exact inference on the approximate model. • This can be easy or hard depending on the tree-width and on the measure of “high-quality”, and on the learning paradigm. • Approximate solution to an exact problem • Approximate inference, tries to approximate in some way what must be computed: Loopy Belief propagation, Sampling/Pruning, Variational/Mean-field, and hybrids between the above EE512 - Graphical Models - J. Bilmes
Finding k-trees • How do we score a k-tree? • Maximum likelihood, or conditional score • May we assume that truth itself is a k-tree • Sometimes simplifications can be made if we assume that truth is part of a known model class, such as a k-tree for some fixed constant k independent of n=|V|, the number of nodes. • How to find best 1-tree? EE512 - Graphical Models - J. Bilmes
Finding 1-trees • Given P, goal is to find best 1-tree approximation of P in a maximum likelihood sense. EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Plethora of negative results • Chickering1996, Chickering/Meek/Heckerman2003: learning Bayesian networks in ML sense is NP-hard (“is there a BN with fixed upper bound on in-degree that achieves a given ML score?”) • Dasgupta1999: learning polytrees in ML sense is NP-hard (“is there a poly-tree with fixed upper-bound in-degree with given ML score?”) and worse, there is constant c such that NP-complete to decide if there is polytree with score <= c*OPT_score. • Meek2001: learning even a path (sub-class of trees) in ML sense is NP-hard. EE512 - Graphical Models - J. Bilmes
Plethora of negative results • Srebro/Karger2001: learning k-trees in ML sense is hard. • So, generative model structure learning is likely to be a difficult problem (unless k=1, or P=NP). • We next spend a bit of time talking about the Srebro/Karger result. EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Some good news … • PAC framework: key difference, assume graph is in concept class (learn the class of k-trees). This means that if we have sampled data, we assume that the sampled data is from truth which itself is a k-tree. • Hoeffgen’93: Can robustly (polynomial samples in n, 1/ 1/) PAC learn bounded tree-width graphical models, and can robustly and efficiently (algorithm polynomial in same) PAC learn 1-trees. • Narasimhan&Bilmes2004: Can robustly and efficiently PAC learn bounded tree-width graphical models. EE512 - Graphical Models - J. Bilmes
More good news … • Abbeel,Koller,Ng2005: Can robustly and efficiently PAC learn bounded-degree factor graphs • note: this does not have complexity guarantee. E.g., nxn grids have bounded degree but not tree-width. Star has unbounded degree but bounded tree-width. Tree-width crucial for computation in general. EE512 - Graphical Models - J. Bilmes
How to PAC-learn such graphs … • Mutual information is symmetric submodular EE512 - Graphical Models - J. Bilmes
How to PAC-learn such graphs … • Submodularity and Optimization (Narisimhan&Bilmes,2004) EE512 - Graphical Models - J. Bilmes
Another positive result • Since mutual information is symmetric-submodular, we can find optimal partitions: • where • This has implications for clustering (Narishamhan,Jojic,Bilmes’05) and also for structure learning (can find optimal 1-step graph decomposition by finding the optimal k-separator). EE512 - Graphical Models - J. Bilmes
Finding ML decompositions … • Optimal to one level EE512 - Graphical Models - J. Bilmes
Discriminative structure • Goal might be classification using a generative model. • Distinction between parameters & structure • Two possible goals: • 1) find one global structure that classifies well • 2) find class-specific structure (one per class) • In either case, finding a good discriminative structure may render discriminative parameter learning less necessary. EE512 - Graphical Models - J. Bilmes
Optimal discriminative structure procedure … • choose k (for now, lets just assume k=1) • Find tree that best satisfies: EE512 - Graphical Models - J. Bilmes
Properties • Options: • can fix structure and train parameters using either maximum likelihood (generative) or maximum conditional likelihood (discriminative) • Can learn discriminative structure, and can train either generatively or discriminatively • In all cases, assume appropriate regularization. • Bad news: KL-divergence not decomposable w.r.t. tree in the discriminative case. • Goal: identify a local discriminative measure on edges in a graph (analogous to mutual information for generative case). EE512 - Graphical Models - J. Bilmes
EAR measure • EAR (explaining away residual) measure. (Bilmes,’98) • Goal is to maximize EAR: • Intuition: dependence class-conditionally, but otherwise independent • EAR is approximation to expected log conditional posterior. Exact for independent “auxiliary” variables. EE512 - Graphical Models - J. Bilmes
Conditional mutual information? • Conditional mutual information is not guaranteed to discriminate well. • Building a MST using I(X1;X2|C) as edge weights will not necessarily produce a tree with good classification properties. EAR fixes this in certain cases. • Example: 3 features (X1,X2,X3) and a class C EE512 - Graphical Models - J. Bilmes
Generative training/structure EE512 - Graphical Models - J. Bilmes
Generative training/structure EE512 - Graphical Models - J. Bilmes
General Structure Learning EE512 - Graphical Models - J. Bilmes