Download
Download Presentation
Bayesian Decision Theory (Classification)

# Bayesian Decision Theory (Classification)

Download Presentation

## Bayesian Decision Theory (Classification)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Contents • Introduction • Generalize Bayesian Decision Rule • Discriminant Functions • The Normal Distribution • Discriminant Functions for the Normal Populations. • Minimax Criterion • Neyman-Pearson Criterion

2. Bayesian Decision Theory(Classification) Introduction

3. What is Bayesian Decision Theory? • Mathematical foundation for decision making. • Using probabilistic approach to help making decision (e.g., classification) so as to minimize the risk (cost).

4. Preliminaries and Notations a state of nature prior probability feature vector class-conditional density posterior probability

5. Bayesian Rule

6. Decision unimportant in making decision

7. Decision Decide i if P(i|x) > P(j|x)  j  i Decide i if p(x|i)P(i) > p(x|j)P(j)  j  i • Special cases: • P(1)=P(2)=   =P(c) • p(x|1)=p(x|2) =   = p(x|c)

8. Decide i if P(i|x) > P(j|x)  j  i Decide i if p(x|i)P(i) > p(x|j)P(j)  j  i Two Categories Decide 1 if P(1|x) > P(2|x); otherwise decide 2 Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2 • Special cases: • P(1)=P(2) • Decide 1 if p(x|1)> p(x|2); otherwise decide 1 • 2. p(x|1)=p(x|2) • Decide 1 if P(1) > P(2); otherwise decide 2

9. Example R2 R1 P(1)=P(2)

10. R2 R1 R2 R1 Example P(1)=2/3 P(2)=1/3 Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2

11. Classification Error Consider two categories: Decide 1 if P(1|x) > P(2|x); otherwise decide 2

12. Classification Error Consider two categories: Decide 1 if P(1|x) > P(2|x); otherwise decide 2

13. Bayesian Decision Theory(Classification) Generalized Bayesian Decision Rule

14. The Generation a set of c states of nature a set of a possible actions The loss incurred for taking action i when the true state of nature is j. Risk can be zero. We want to minimize the expected loss in making decision.

15. Conditional Risk Given x, the expected loss (risk) associated with taking action i.

16. 0/1 Loss Function

17. Decision Bayesian Decision Rule:

18. Overall Risk Decision function • Bayesian decision rule: • the optimal one to minimize the overall risk • Its resulting overall risk is called the Bayesian risk

19. State of Nature Loss Function Action Two-Category Classification

20. Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2

21. Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2 positive positive Posterior probabilities are scaled before comparison.

22. irrelevant Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2

23. This slide will be recalled later. Two-Category Classification Threshold Likelihood Ratio Perform 1 if

24. Bayesian Decision Theory(Classification) Discriminant Functions

25. Action (e.g., classification) x How to define discriminant functions? The Multicategory Classification gi(x)’s are called the discriminant functions. g1(x) (x) g2(x) gc(x) Assign x to i if gi(x) > gj(x) for all j i.

26. If f(．) is a monotonically increasing function, than f(gi(．) )’s are also be discriminant functions. Simple Discriminant Functions Minimum Risk case: Minimum Error-Rate case:

27. Decision Regions Two-category example Decision regions are separated by decision boundaries.

28. Bayesian Decision Theory(Classification) The Normal Distribution

29. Basics of Probability Discrete random variable (X) －Assume integer Probability mass function (pmf): Cumulative distribution function (cdf): Continuous random variable (X) not a probability Probability density function (pdf): Cumulative distribution function (cdf):

30. Expectations Let g be a function of random variable X. The kth moment The 1st moment The kth central moments

31. Fact: Important Expectations Mean Variance

32. Entropy The entropy measures the fundamental uncertainty in the value of points selected randomly from a distribution.

33. p(x) μ x σ σ 2σ 2σ 3σ 3σ • Properties: • Maximize the entropy • Central limit theorem Univariate Gaussian Distribution X~N(μ,σ2) E[X] =μ Var[X] =σ2

34. Random Vectors A d-dimensional random vector VectorMean: Covariance Matrix:

35. Multivariate Gaussian Distribution X~N(μ,Σ) A d-dimensional random vector E[X] =μ E[(X-μ) (X-μ)T] =Σ

36. Properties of N(μ,Σ) X~N(μ,Σ) A d-dimensional random vector Let Y=ATX, where A is a d × k matrix. Y~N(ATμ, ATΣA)

37. Properties of N(μ,Σ) X~N(μ,Σ) A d-dimensional random vector Let Y=ATX, where A is a d × k matrix. Y~N(ATμ, ATΣA)

38. On Parameters of N(μ,Σ) X~N(μ,Σ)

39. More On Covariance Matrix  is symmetric and positive semidefinite. : orthonormal matrix, whose columns are eigenvectors of . : diagonal matrix (eigenvalues).

40. Whitening Transform X~N(μ,Σ) Y=ATX Y~N(ATμ, ATΣA) Let

41. Whitening Transform Whitening X~N(μ,Σ) Linear Transform Y=ATX Y~N(ATμ, ATΣA) Let Projection

42. Mahalanobis Distance X~N(μ,Σ) r2 constant depends on the value of r2

43. Mahalanobis Distance X~N(μ,Σ) r2 constant depends on the value of r2

44. Bayesian Decision Theory(Classification) Discriminant Functions for the Normal Populations

45. Minimum-Error-Rate Classification Xi~N(μi,Σi)

46. Minimum-Error-Rate Classification Three Cases: Case 1: Classes are centered at different mean, and their feature components are pairwisely independent have the same variance. Case 2: Classes are centered at different mean, but have the same variation. Case 3: Arbitrary.

47. Case 1. i = 2I irrelevant irrelevant

48. Case 1. i = 2I

49. Case 1. i = 2I Boundary btw. i and j