Maximum Likelihood Estimation

Learning Probabilistic Graphical Models Parameter Estimation MaximumLikelihoodEstimation

Biased Coin Example P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- • Tosses are independent of each other • Tosses are sampled from the same distribution (identically distributed) sampled IID from P

IID as a PGM   X . . . X[1] X[M] Data m

L(D:)  0 0.2 0.4 0.6 0.8 1 Maximum Likelihood Estimation • Goal: find [0,1] that predicts D well • Prediction quality = likelihood of D given 

Maximum Likelihood Estimator • Observations: MH heads and MT tails • Find  maximizing likelihood • Equivalent to maximizing log-likelihood • Differentiating the log-likelihood and solving for :

Sufficient Statistics • For computing  in the coin toss example, we only needed MH and MT since •  MH and MT are sufficient statistics

Datasets Statistics Sufficient Statistics • A function s(D) is a sufficient statistic from instances to a vector in k if for any two datasets D and D’ and any  we have

Sufficient Statistic for Multinomial • For a dataset D over variable X with k values, the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D • Sufficient statistic s(x) is a tuple of dimension k • s(xi)=(0,...0,1,0,...,0) i

Sufficient Statistic for Gaussian • Gaussian distribution: • Rewrite as • Sufficient statistics for Gaussian: s(x)=<1,x,x2>

Maximum Likelihood Estimation • MLE Principle: Choose  to maximize L(D:) • Multinomial MLE: • Gaussian MLE:

Summary • Maximum likelihood estimation is a simple principle for parameter selection given D • Likelihood function uniquely determined by sufficient statistics that summarize D • MLE has closed form solution for many parametric distributions

END END END

Maximum Likelihood Estimation