1 / 55

Bayesian Decision Theory

Bayesian Decision Theory. Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow. Statistical Pattern Recognition. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment pattern representation

Télécharger la présentation

Bayesian Decision Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Decision Theory Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow

  2. Statistical Pattern Recognition • The design of a recognition system requires careful attention to the following issues: • definition of pattern classes, • sensing environment • pattern representation • feature extraction and selection • cluster analysis • classifier design and learning • selection of training and test samples • performance evaluation.

  3. Statistical Pattern Recognition….. • In statistical pattern recognition, a pattern is represented by a set of d features, or attributes, viewed as a d-dimensional feature vector. • Well-known concepts from statistical decision theory are utilized to establish decision boundaries between pattern classes. • The recognition system is operated in two modes: training (learning) and classification (testing)

  4. Model for statistical pattern recognition

  5. The role of the preprocessing module is to segment the pattern of interest from the background, remove noise, normalize the pattern, and any other operation which will contribute in defining a compact representation of the pattern. • In the training mode, the feature extraction/selection module finds the appropriate features for representing the input patterns and the classifier is trained to partition the feature space. The feedback path allows a designer to optimize the preprocessing and feature extraction/selection strategies. • In the classification mode, the trained classifier assigns the input pattern to one of the pattern classes under consideration based on the measured features.

  6. Decision theory • Decision theory is the study of making decisions that have a significant impact • Decision-making is distinguished into: • Decision-making under certainty • Decision-making under non-certainty • Decision-making under risk • Decision-making under uncertainty

  7. Probability theory • Most decisions have to be taken in the presence of uncertainty • Probability theory quantifies uncertainty regarding the occurrence of events or states of the world • Basic elements of probability theory: • Random variables describe aspects of the world whose state is initially unknown • Each random variable has a domain of values that it can take on (discrete, boolean, continuous) • An atomic event is a complete specification of the state of the world, i.e. an assignment of values to variables of which the world is composed

  8. Probability Theory.. • Probability space • The sample space S={e1 ,e2 ,…,en } which is a set of atomic events • Probability measure P which assigns a real number between 0 and 1 to the members of the sample space • Axioms • All probabilities are between 0 and 1 • The sum of probabilities for the atomic events of a probability space must sum up to 1 • The certain event S (the sample space itself) has probability 1,and the impossible event which never occurs, probability 0

  9. Prior • Priori Probabilities or Prior reflects our prior knowledge of how likely an event occurs. • In the absence of any other information, a random variable is assigned a degree of belief called unconditional or prior probability

  10. Class Conditional probability • When we have information concerning previously unknown random variables then we use posterior or conditional probabilities: P(a|b) the probability of a given event a that we know b • Alternatively this can be written (the product rule): P(a b)=P(a|b)P(b)

  11. Bayes’ rule • The product rule can be written as: • P(a b)=P(a|b)P(b) • P(a b)=P(b|a)P(a) • By equating the right-hand sides: • This is known as Bayes’ rule

  12. Bayesian Decision Theory • Bayesian Decision Theory is a fundamental statistical approach that quantifies the tradeoffs between various decisions using probabilities and costs that accompany such decisions. • Example: Patient has trouble breathing – Decision: Asthma versus Lung cancer – Decide lung cancer when person has asthma • Cost: moderately high (e.g., order unnecessary tests, scare patient) – Decide asthma when person has lung cancer • Cost: very high (e.g., lose opportunity to treat cancer at early stage, death)

  13. Decision Rules • Progression of decision rules: • – (1) Decide based on prior probabilities • – (2) Decide based on posterior probabilities • – (3) Decide based on risk

  14. Fish Sorting Example Revisited

  15. Decision based on prior probabilities

  16. Question • Consider a two-class problem, { c1and c2} where the prior probabilities of the two classes are given by • P ( c1) = ⋅7 and P ( c2) = ⋅3 • Design a classification rule for a pattern based only on prior probabilities • Calculation of Error Probability – P ( error )

  17. Solution

  18. Decision based on class conditional probabilities

  19. Posterior Probabilities

  20. Bayes Formula • Suppose the priors P(wj) and conditional densities p(x|wj) are known, prior likelihood posterior evidence

  21. Making a Decision

  22. Probability of Error Average probability of error P(error) Bayes decision rule minimizes this error because

  23. The dotted line at x0 is a threshold partitioning the feature • space into two regions,R1 and R2. According to the Bayes decision rule,for all values • of x in R1 the classifier decides 1 and for all values in R2 it decides 2. However, • it is obvious from the figure that decision errors are unavoidable. Example of the two regions R1 and R2 formed by the Bayesian classifier for the case of two equiprobable classes. The dotted line at x0 is a threshold partitioning the feature space into two regions,R1 and R2. According to the Bayes decision rule, for all values of x in R1 the classifier decides 1 and for all values in R2 it decides 2. However, it is obvious from the figure that decision errors are unavoidable.

  24. total probability,Pe,of committing a decision error • which is equal to the total shaded area under the curves in Figure

  25. Minimizing the Classification Error Probability • Show that the Bayesian classifier is optimal with respect to minimizing the classification error probability.

  26. Generalized Bayesian Decision Theory

  27. Bayesian Decision Theory…

  28. Bayesian Decision Theory…

  29. Conditional Risk

  30. Minimum-Risk Classification • For every x the decision function α(x) assumes one of the a values α1, ..., αa. • The overall risk R is the expected loss associated with a given decision rule.

  31. Two-category classification 1: deciding 1 2: deciding 2 ij = (i|j) loss incurred for deciding iwhen the true state of nature is jConditional risk: R(1 | x) = 11P(1 | x) + 12P(2 | x) R(2 | x) = 21P(1 | x) + 22P(2 | x)

  32. Our rule is the following: if R(1 | x) < R(2 | x) action 1: “decide 1” is taken This results in the equivalent rule : decide 1if: By employingBayes’ formula (21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2) and decide2 otherwise

  33. Likelihood ratio Then take action 1 (decide 1) Otherwise take action 2 (decide 2)

  34. Example • Suppose selection of w1 and w2 has same probability: P(w1)=p(w2)=1/2 Assume that the loss matrix is of the form • If misclassification of patterns that come from w2 is considered to have serious consequences, then we must choose 12 > 21.

  35. Thus, patterns are assigned to w2 class if • That is, P(x | 1) is multiplied by a factor less than 1

  36. Example

More Related