100 likes | 220 Vues
This presentation by John Paisley explores multi-task learning, building upon single-task solutions for classification and regression functions using high-dimensional data. It presents innovative methods to estimate feature importance and correlations through the Gradient Outer Product (GOP) matrix. This extension addresses simultaneous feature learning and classification/regression, optimizing expected error while regularizing in Reproducing Kernel Hilbert Spaces (RKHS). The presented experiments validate the effectiveness of the proposed multi-task learning framework.
E N D
Estimating variable structure and dependence in multi-task learning via gradients By: Justin Guinney, Qiang Wu and Sayan Mukherjee Presented by: John Paisley
Outline • Outline of presentation • General problem • Review of single-task solution • Extension to multi-task • Experiments
General problem • Have small number of high dimensional data, x, with corresponding response variable, y (fully supervised) • Want to simultaneously build a classification or regression function and learn important features, as well as correlation between features (to know if two features are important in the same way) • Xuejun presented their single-task solution. This paper extends this to the multi-task setting.
Single-Task Solution (classification) • By Taylor expansion, estimate the classification function as • Seek to minimize expected error • Where is a weight function and ? And phi is a convex loss function. • To solve this, regularize in RKHS
Single-Task (regression) • Use the response variable for each input and only learn the gradient.
Single-Task (solution and value of interest) • By representer theorem, this has solution of the form • The gradient outer product (GOP) is the matrix with all feature information. This is approximated as This paper Xuejun’s paper Matlab
GOP • This matrix is central to their paper because it tells all the information about the importance of each feature. The diagonal can be used to rank each feature’s importance and the off diagonal tells how features are correlated (therefore if two features are important in the same way, only one need be selected). • My confusion: • I take this to mean that which would resolve previous page • However, constructing a discrete Gaussian kernel in Matlab, this isn’t true (and makes no sense to me why it should be true).
Extension to multi-task • Very logical extension. They assume a base function and have a task-specific correction. • Classification RKHS regularization • Regression RKHS regularization