Bayes Rule

Bayes Rule Rev. Thomas Bayes(1702-1761) • How is this rule derived? • Using Bayes rule for probabilistic inference: • P(Cause | Evidence): diagnostic probability • P(Evidence | Cause): causal probability

Bayesian decision theory • Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidenceE = e • Partially observable, stochastic, episodic environment • Examples: X = {spam, not spam}, e = email messageX = {zebra, giraffe, hippo}, e = image features • The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise • What is agent’s optimal estimate of the value of X? • Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)

MAP decision • X = x: value of query variable • E = e: evidence • Maximum likelihood (ML) decision: posterior likelihood prior

Example: Spam Filter • We have X = {spam, ¬spam}, E = email message. • What should be our decision criterion? • Compute P(spam | message) andP(¬spam | message), and assign the message to the class that gives higher posterior probability

Example: Spam Filter • We have X = {spam, ¬spam}, E = email message. • What should be our decision criterion? • Compute P(spam | message) andP(¬spam | message), and assign the message to the class that gives higher posterior probability P(spam | message)  P(message | spam) P(spam) P(¬spam | message)  P(message | ¬spam) P(¬spam)

Example: Spam Filter • We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) • How do we represent the message? • Bag of words model: • The order of the words is not important • Each word is conditionally independent of the others given message class • If the message consists of words(w1, …, wn), how do we compute P(w1, …, wn | spam)? • Naïve Bayes assumption: each word is conditionally independent of the others given message class

Example: Spam Filter • Our filter will classify the message as spam if • In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow: • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi| ¬spam) • These parameters need to be learned from a training set (a representative sample of email messages marked with their classes)

Parameter estimation • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi | ¬spam) • Estimation by empirical word frequencies in the training set: • This happens to bethe parameter estimate that maximizes the likelihood of the training data: # of occurrences of wi in spam messages P(wi| spam) = total # of words in spam messages d: index of training document, i: index of a word

Parameter estimation • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi | ¬spam) • Estimation by empirical word frequencies in the training set: • Parameter smoothing: dealing with words that were never seen or seen too few times • Laplacian smoothing: pretend you have seen every vocabulary word one more time than you actually did # of occurrences of wi in spam messages P(wi| spam) = total # of words in spam messages

Bayesian decision making: Summary • Suppose the agent has to make decisions about the value of an unobserved query variable Xbased on the values of an observed evidence variableE • Inference problem: given some evidence E = e, what is P(X | e)? • Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample{(x1,e1), …, (xn,en)}

Bag-of-word models for images Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Bag-of-word models for images • Extract image features

Bag-of-word models for images • Extract image features • Learn “visual vocabulary”

Bag-of-word models for images • Extract image features • Learn “visual vocabulary” • Map image features to visual words

Bayes Rule

Bayes Rule

Presentation Transcript

Bayes Rule

Bayes’ Rule

Law of Total Probability and Bayes’ Rule

Bayes Rule

Bayes Rule and Bayesian Networks

Using Bayes ’ rule to formulate your problem

Review of Bayes’ Rule

Bayes Rule for probability

A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers

Probabilistic Reasoning With Bayes’ Rule

Bayes’ Rule - Example

PART 2: Statistical Pattern Classification : Optimal Classification with Bayes Rule

Lecture 6. Bayes Rule

Naïve Bayes

Bayes Nets

Bayes Rule

Review of Bayes’ Rule

Conditional Probability, Total Probability Theorem and Bayes’ Rule

Bayes Decision Rule