200 likes | 308 Vues
This seminar presented by Michael I. Jordan and Robert A. Jacobs at UCSD focuses on modular and hierarchical learning systems, detailing the Mixture of Experts architecture and the Learning Algorithm. The session covers decision trees, classification problems, and probabilistic interpretations of expert networks. Utilizing a gradient-based learning approach and the EM algorithm, participants will explore how to optimize the likelihood of data assignment in high-dimensional spaces. The seminar aims to provide participants with insights into effective problem-solving through modular decomposition.
E N D
Modular and hierarchical learning systems Michael I. Jordan and Robert A. Jacobs Presented by Danke Xie Cognitive Science, UCSD CSE 291s Lawrence Saul 4/26/2007
Outline • Decision Tree • Mixture of Experts Architecture • The Mixture of Experts Model • Learning algorithm • Hierarchical Mixture of Experts architecture • Demo
Introduction • Why modular and hierarchical systems? • Divide into less complex problems • Ex: supervised learning y x f(x) g(x)
Decision Tree • Classification problem • Decision Tree x y {0,1} X5 > 3 y n X2 < 4 ? X6 > 7 ? y n n y 0 1 0 1
Decision Tree • What’s missing • Living in10,000-dimension space? • Learning is greedy optimizing likelihood • Soft decision / assignment of task to experts 2 1 4 3 Example: 4 classes in high-dimensional space
Mixture of experts (ME) architecture • Gating network Generating weights • Expert network Interpreted probabilistically as
Generating data • Data set • Given x, randomly choose labels i with probability where is the parameter of the data generating model • Generate y according to • Learn to estimate and from data
A Gradient-based Learning algorithm • Maximize log-likelihood • Optimize with respect to and where =
Analogy of Mixture of Gaussians • The learning algorithm can also be derived using EM algorithm • EM algorithm can be used to find maximum likelihood estimates of parameters, where the likelihood cannot be computed without knowing how to assign data points to clusters / experts • The probabilities of the assignments can be seen as latent variables. This is similar to all of Mixture of Gaussians and (Hierarchical) Mixture of Experts.
EM algorithm • Mixture of Gaussians (unsupervised)
EM algorithm • Mixture of Experts (supervised)
A Gradient-based Learning algorithm • Maximize log-likelihood • We derive learning rules for the special case • Expert networks and gating networks are linear • Simple probabilistic density for Expert Networks
A Gradient-based Learning algorithm • Take derivative of l with respect to өi
Learning rule for ME • Experts are linear models
Learning rule for HME • LMS-like learning algorithm