190 likes | 292 Vues
Explore the theoretical analysis and solution of the hierarchical variational Bayes approach in linear inverse problems, comparing it to the James-Stein estimator. Learn about setting, application in Magnetoencephalography (MEG), and implications for relevant determination and future work.
E N D
Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute of Technology
Contents • Introduction • Linear inverse problem • Hierarchical variational Bayes [Sato et al.04] • James-Stein estimator • Purpose • Theoretical analysis • Setting • Solution • Discussion • Conclusions
Linear inverse problem Linear inverse problem Example : Magnetoencephalography (MEG) : magnetic field detected by N detectors : observable : lead field matrix : constant matrix : parameter to be estimated : electric current at M sites : noise : observation noise Ill-posed !
Prior : Model : 1. Minimum norm maximum likelihood 2. Maximum A posterior (MAP) , where B-2 is constant. 3. Hierarchical Bayes B-2 is also a parameter to be estimated! Methods for ill-posed problem 1, 2 : similar. 3 : very different from 1, 2.
If estimate and by Bayesian methods, many small elements become zero. (relevance determination) See [9] if interested. Hierarchical Bayes a.k.a. Automatic Relevance Determination (ARD) [Mackay94,Neal96] Model : Prior : Estimate from observation, introducing hyperprior : Why ? singularities, hierarchy
Restriction: Hierarchical variational Bayes But, Bayes estimation requires huge computational costs. Apply VB [Sato et al.04]. Optimum = Bayes posterior Trial posterior: Free energy: where Variational method
: ML estimator (arithmetic mean) ML is efficient (never dominated by any unbiased estimator),but is inadmissible (dominated by biased estimator) when [Stein56]. ML JS (K=3) shrinkage factor true mean James-Stein (JS) estimator for any true Domination of a over b : for a certain true K-dimensional mean estimation (Regular model) A certain relation between EB and JSwas discussed in [Efron&Morris73] : samples James-Stein estimator [James&Stein61]
: degree of shrinkage Purpose [Sato et al.04] have derived simple iterative algorithm based on HVB in MEG application, and experimentally shown good performance. We theoretically analyze the HVB and derive its solution, and discuss a relation between HVB and positive-part JS, focusing on simplified version of Sato’s approach. Positive part JS :
Contents • Introduction • Linear inverse problem • Hierarchical variational Bayes [Sato et al.04] • James-Stein estimator • Purpose • Theoretical analysis • Setting • Solution • Discussion • Conclusions
Setting Consider time series data. a’ ARDModel : Prior : time u U b Use constant hyperparameter during U [Sato et al. 04] time u
Summary of setting Observable : Parameter : Hyperparmeter (constant during U): n : # of samples Constant matrix: Model : priors: m-th element : d-dimensional normal where : identity matrix
Variational condition Restriction: Variational method
Theorem 1 Not explicit! Theorem 1: The VB estimator of m-th element is given by where HVB solution is similar to positive-part JS estimator with degree of shrinkage proportional to U.
Contents • Introduction • Linear inverse problem • Hierarchical variational Bayes [Sato et al.04] • James-Stein estimator • Purpose • Theoretical analysis • Setting • Solution • Discussion • Conclusions
Proposition Simply use positive-part JS estimator : where Only requires calculation of Moore-Penrose inverse. (HVB needs iterative calculation.)
- When s are orthogonal, - When all s are parallel or orthogonal, Difference between VB and JS asymptotically equivalent. JS suppresses overfitting more than HVB. (ehhances relevant determination.) future work. - Otherwise,
Contents • Introduction • Linear inverse problem • Hierarchical variational Bayes [Sato et al.04] • James-Stein estimator • Purpose • Theoretical analysis • Setting • Solution • Discussion • Conclusions
Conclusions & future work • Conclusions • HVB provides similar result to JS estimation in linear inverse problem. • Time duration U affects learning. (large U enhances relevance determination.) • Future work • Difference from JS. • Bounds of Generalization Error. U a’ b time u time u