540 likes | 690 Vues
This presentation explores the concept of Maximum a Posterior (MAP) estimation in Hidden Markov Models (HMMs), focusing on both discrete and semi-continuous cases. We delve into using Dirichlet and normal-Wishart priors to derive MAP estimates, emphasizing the significance of prior information in parameter estimation. The session covers the derivation of initial estimates through modes and means of prior densities, alongside detailing segmental MAP estimates. An appendix introduces matrix calculus concepts relevant to the discussions, providing a comprehensive understanding for practitioners in the field.
E N D
Maximum a Posterior Presented by 陳燦輝
Maximum a Posterior • Introduction • MAP for Discrete HMM • Prior Dirichlet • MAP for Semi-Continuous HMM • Prior Dirichlet + normal-Wishart • Segmental MAP Estimates • Conclusion • Appendix-Matrix Calculus
Introduction • HMM parameter estimators have been derived purely from the training observation sequences without any prior information included. • There may be many cases in which the prior information about the parameters is available, ex : previous experience
Discrete HMM Definition :
Discrete HMM Q-function :
Discrete HMM Q-function :
Discrete HMM Q-function : 同理
Discrete HMM R-function :
Discrete HMM Initial probability
Discrete HMM Transition probability
Discrete HMM observation probability
Discrete HMM • How to choose the initial estimate for ? • One reasonable choice of the initial estimate is the mode of the prior density.
Discrete HMM • What’s the mode ? • So applying Lagrange Multiplier we can easily derive above modes. • Example :
Discrete HMM • Another reasonable choice of the initial estimate is the mean of the prior density. • Both are some kind of summarization of the available information about the parameters before any data are observed.
SCHMM independent
Model 1 Model 2 Model M SCHMM
SCHMM Q-function :
SCHMM Q-function :
SCHMM Initial probability • Differentiating w.r.t and equate it to zero.
SCHMM Transition probability • Differentiating w.r.t and equate it to zero.
SCHMM Mixture weight • Differentiating w.r.t and equate it to zero.
SCHMM • Differentiating w.r.t and equate it to zero. • Differentiating w.r.t and equate it to zero.
SCHMM • Full Covariance matrix case :
SCHMM • Full Covariance matrix case :
SCHMM • Full Covariance matrix case :
SCHMM • Full Covariance matrix case : (1) (2) (3)
SCHMM • Full Covariance matrix case :
SCHMM Full Covariance • The initial estimate can be chosen as the mode of the prior PDF • And also can be chosen as the mean of the prior PDF
SCHMM • Diagonal Covariance matrix case : • Then and
SCHMM • Diagonal Covariance matrix case :
SCHMM Diagonal Covariance • Diagonal Covariance matrix case :
SCHMM Diagonal Covariance • Diagonal Covariance matrix case :
SCHMM • Diagonal Covariance matrix case :
SCHMM • Diagonal Covariance matrix case : (1) (2) (3)
SCHMM Diagonal Covariance • Diagonal Covariance matrix case :
SCHMM Diagonal Covariance • The initial estimate can be chosen as the mode of the prior PDF • And also can be chosen as the mean of the prior PDF
Segmental MAP Estimates SCHMM
Conclusion • The important issue of prior density is discussed. • Some application : • Model adaptation, HMM training, IR(?)
Appendix-Matrix Calculus(1) • Notation:
Appendix-Matrix Calculus(2) • Properties 1: • proof • Properties 1— Extension: • proof
Appendix-Matrix Calculus(3) • Properties 2: • proof
Appendix-Matrix Calculus(4) • Properties 3: • proof
Appendix-Matrix Calculus(5) • Properties 4: • proof