Simple Bayesian Supervised Models
Simple Bayesian Supervised Models. Saskia Klein & Steffen Bollmann. Content. Recap from last weak Bayesian Linear Regression What is linear regression? Application of the Bayesian Theory on Linear Regression Example Comparison to Conventional Linear Regression
Simple Bayesian Supervised Models
E N D
Presentation Transcript
Simple BayesianSupervisedModels Saskia Klein & Steffen Bollmann
Content • Recap from last weak • Bayesian Linear Regression • What is linear regression? • Application of the Bayesian Theory on Linear Regression • Example • Comparison to Conventional Linear Regression • Bayesian Logistic Regression • Naive Bayes classifier • Source: • Bishop (ch. 3,4); Barber (ch. 10) Saskia Klein & Steffen Bollmann
Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence
Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma
Linear Regression • goal: predict the value of a target variable given the value of a D-dimensional vector of input variables • linear regression models: linear functions of the adjustable parameters for example: Saskia Klein & Steffen Bollmann
Linear Regression • Training • … training data set comprising observations, where • … corresponding target values • compute the weights • Prediction • goal: predict the value of for a new value of • = model the predictive distribution • and make predictions of in such a way as to minimize the expected value of a loss function Saskia Klein & Steffen Bollmann
Examples of linear regression models • simplest linear regression model: • linear function of the weights/parameters and the data • linear regression models using basis functions : Saskia Klein & Steffen Bollmann
Bayesian Linear Regression • model: • … target variable • … model • … data • … weights/parameters • … additive Gaussian noise: with zero mean and precision (inverse variance) Saskia Klein & Steffen Bollmann
Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence
Bayesian Linear Regression - Likelihood • likelihoodfunction: • observation of N training data sets of inputs and target values (independently drawn from the distribution) Saskia Klein & Steffen Bollmann
Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence
Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma
Bayesian Linear Regression - Prior • prior probability distribution over the model parameters • conjugate prior: Gaussian distribution • mean and covariance Saskia Klein & Steffen Bollmann
Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence
Bayesian Linear Regression – Posterior Distribution • due to the conjugate prior, the posterior will also be Gaussian (derivation: Bishop p.112) Saskia Klein & Steffen Bollmann
Example Linear Regression • matlab Saskia Klein & Steffen Bollmann
Predictive Distribution • making predictionsoffornewvaluesof • predictivedistribution: • variance of the distribution: • first term represents the noise in the data • second term reflects the uncertainty associated with the parameters • optimal prediction, for a new value of , would be the conditional mean of the target variable: Saskia Klein & Steffen Bollmann
Common Problem in Linear Regression: Overfitting/modelcomplexitiy • Least Squares approach (maximizing the likelihood): • point estimate of the weights • Regularization: regularization term and value needs to be chosen • Cross-Validation: requires large datasets and high computational power • Bayesian approach: • distribution of the weights • good prior • model comparison: computationally demanding, validation data not required Saskia Klein & Steffen Bollmann
From Regression to Classification • for regression problems: • target variable was the vector of real numbers whose values we wish to predict • in case of classification: • target values represent class labels • two-class problem: • K > 2: class 2 Saskia Klein & Steffen Bollmann
Classification • goal: take an input vector and assign it to one of discrete classes decision boundary Saskia Klein & Steffen Bollmann
Bayesian Logistic Regression • model the class-conditional densities and the prior probabilities and apply Bayes Theorem: Saskia Klein & Steffen Bollmann
Bayesian Logistic Regression • exact Bayesian inference for logistic regression is intractable • Laplace approximation • aims to find a Gaussian approximation to a probability density defined over a set of continuous variables • posterior distribution is approximated around Saskia Klein & Steffen Bollmann
Example • Barber: DemosExercises\demoBayesLogRegression.m Saskia Klein & Steffen Bollmann
Example • Barber: DemosExercises\demoBayesLogRegression.m Saskia Klein & Steffen Bollmann
Naive Bayesclassifier • Why naive? • strong independence assumptions • assumes that the presence/absence of a feature of a class is unrelated to the presence/absence of any other feature, given the class variable • Ignores relation between features and assumes that all feature contribute independently to a class [http://en.wikipedia.org/wiki/Naive_Bayes_classifier] Saskia Klein & Steffen Bollmann
Thank you for your attention Saskia Klein & Steffen Bollmann