810 likes | 941 Vues
This talk delves into loss-based learning methods leveraging weak supervision, focusing on advanced techniques like latent structured SVMs. It covers the mathematical foundations of Latent SSVM, ranking methods, and applications in brain activation delays using M/EEG data as well as probabilistic segmentation of MRI scans. The discussion is academically rich, referencing significant works from NIPS, AISTATS, CVPR, and ICML. Attendees will gain insights into optimizing scoring functions, predicting outcomes, and understanding empirical risk minimization in weakly supervised settings.
E N D
Loss-based Learning with Weak Supervision M. Pawan Kumar
About the Talk • Methods that use latent structured SVM • A little math-y • Initial stages
Outline • Latent SSVM • Ranking • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009
Weakly Supervised Data x Input x h Output y {-1,+1} Hidden h y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Φ(x,h) Ψ(x,+1,h) = y = +1 0
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector 0 Ψ(x,-1,h) = y = +1 Φ(x,h)
Weakly Supervised Classification x Feature Φ(x,h) h Joint Feature Vector Ψ(x,y,h) y = +1 Score f : Ψ(x,y,h) (-∞, +∞) Optimize score over all possible y and h
Latent SSVM Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h)
Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} w* = argminwΣiΔ(yi,yi(w)) Minimize empirical risk specified by loss function Highly non-convex in w Cannot regularize w to prevent overfitting
Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} wTΨ(x,yi(w),hi(w)) + Δ(yi,yi(w)) - wTΨ(x,yi(w),hi(w)) ≤ wTΨ(x,yi(w),hi(w)) + Δ(yi,yi(w)) - maxhiwTΨ(x,yi,hi) ≤ maxy,h{wTΨ(x,y,h) + Δ(yi,y)} - maxhiwTΨ(x,yi,hi)
Learning Latent SSVM Training data {(xi,yi), i= 1,2,…,n} minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi Difference-of-convex program in w Local minimum or saddle point solution (CCCP)
CCCP Start with an initial estimate of w Impute hidden variables Loss independent hi*= argmaxhwTΨ(xi,yi,h) Update w Loss dependent minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - wTΨ(xi,yi,hi*)≤ ξi Repeat until convergence
Recap Scoring function wTΨ(x,y,h) Prediction y(w),h(w) = argmaxy,hwTΨ(x,y,h) Learning minw ||w||2 + C Σiξi wTΨ(xi,y,h) + Δ(yi,y) - maxhiwTΨ(xi,yi,hi)≤ ξi
Outline • Latent SSVM • Ranking • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Joint Work with AseemBehl and C. V. Jawahar
Ranking Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1
Ranking Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Average Precision = 1 Accuracy = 1 Average Precision = 0.92 Average Precision = 0.81 Accuracy = 0.67
Ranking During testing, AP is frequently used During training, a surrogate loss is used Contradictory to loss-based learning Optimize AP directly
Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI Yue, Finley, Radlinski and Joachims, 2007
Supervised Learning - Input P N = {HP,HN} Training images X Bounding boxes H
Supervised Learning - Output Ranking matrix Y +1 if i is better ranked than k Yik = -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*
SSVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})
Prediction using SSVM Y(w) = argmaxYwTΨ(X,Y, {HP,HN}) Sort by value of sample score wTΦ(xi,hi) Same as standard binary SVM
Learning SSVM minw Δ(Y*,Y(w)) Loss = 1 – AP of prediction
Learning SSVM wTΨ(X,Y(w),{HP,HN}) + Δ(Y*,Y(w)) - wTΨ(X,Y(w),{HP,HN})
Learning SSVM wTΨ(X,Y(w),{HP,HN}) + Δ(Y*,Y(w)) - wTΨ(X,Y*,{HP,HN})
Learning SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ
Learning SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ Loss Augmented Inference
Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank positives according to sample scores
Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank negatives according to sample scores
Loss Augmented Inference Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Slide best negative to a higher rank Terminate after considering last negative Continue until score stops increasing Slide next negative to a higher rank Continue until score stops increasing Optimal loss augmented inference
Recap Scoring function wTΨ(X,Y,{HP,HN}) Prediction Y(w) = argmaxYwTΨ(X,Y, {HP,HN}) Learning Using optimal loss augmented inference
Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI
Weakly Supervised Learning - Input Training images X
Weakly Supervised Learning - Latent Bounding boxes HP Training images X All bounding boxes in negative images are negative
Intuitive Prediction Procedure Select the best bounding boxes in all images
Intuitive Prediction Procedure Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank them according to their sample scores
Weakly Supervised Learning - Output Ranking matrix Y +1 if i is better ranked than k Yik = -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*
Latent SSVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})
Prediction using Latent SSVM maxY,HwTΨ(X,Y, {HP,HN})
Prediction using Latent SSVM maxY,HwTΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Choose best bounding box for positives Choose worst bounding box for negatives Not what we wanted
Learning Latent SSVM minw Δ(Y*,Y(w)) Loss = 1 – AP of prediction
Learning Latent SSVM wTΨ(X,Y(w),{HP(w),HN(w)}) + Δ(Y*,Y(w)) - wTΨ(X,Y(w),{HP(w),HN(w)})
Learning Latent SSVM wTΨ(X,Y(w),{HP(w),HN(w)}) + Δ(Y*,Y(w)) - wTΨ(X,Y*,{HP,HN}) maxH
Learning Latent SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY,H Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ maxH
Learning Latent SSVM minw ||w||2+ C ξ wTΨ(X,Y,{HP,HN}) + maxY,H Δ(Y*,Y) - wTΨ(X,Y*,{HP,HN}) ≤ ξ maxH Loss Augmented Inference Cannot be solved optimally
Recap Unintuitive prediction Unintuitive objective function Non-optimal loss augmented inference Can we do better?
Outline • Latent SSVM • Ranking • Supervised Learning • Weakly Supervised Learning • Latent AP-SVM • Experiments • Brain Activation Delays in M/EEG • Probabilistic Segmentation of MRI
Latent AP-SVM Formulation Joint feature vector ΣiPΣkNYik (Φ(xi,hi)-Φ(xk,hk)) Ψ(X,Y,{HP,HN}) = |P||N| Scoring function wTΨ(X,Y,{HP,HN})
Prediction using Latent AP-SSVM Choose best bounding box for all samples hi(w) = argmaxhwTΦ(xi,h) Optimize over the ranking Y(w) = argmaxYwTΨ(X,Y, {HP(w),HN(w)}) Sort by sample scores