html5-img
1 / 95

Bayesian Decision Theory (Classification)

Bayesian Decision Theory (Classification). 主講人:虞台文. Contents. Introduction Generalize Bayesian Decision Rule Discriminant Functions The Normal Distribution Discriminant Functions for the Normal Populations. Minimax Criterion Neyman-Pearson Criterion.

Sophia
Télécharger la présentation

Bayesian Decision Theory (Classification)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Decision Theory(Classification) 主講人:虞台文

  2. Contents • Introduction • Generalize Bayesian Decision Rule • Discriminant Functions • The Normal Distribution • Discriminant Functions for the Normal Populations. • Minimax Criterion • Neyman-Pearson Criterion

  3. Bayesian Decision Theory(Classification) Introduction

  4. What is Bayesian Decision Theory? • Mathematical foundation for decision making. • Using probabilistic approach to help making decision (e.g., classification) so as to minimize the risk (cost).

  5. Preliminaries and Notations a state of nature prior probability feature vector class-conditional density posterior probability

  6. Bayesian Rule

  7. Decision unimportant in making decision

  8. Decision Decide i if P(i|x) > P(j|x)  j  i Decide i if p(x|i)P(i) > p(x|j)P(j)  j  i • Special cases: • P(1)=P(2)=   =P(c) • p(x|1)=p(x|2) =   = p(x|c)

  9. Decide i if P(i|x) > P(j|x)  j  i Decide i if p(x|i)P(i) > p(x|j)P(j)  j  i Two Categories Decide 1 if P(1|x) > P(2|x); otherwise decide 2 Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2 • Special cases: • P(1)=P(2) • Decide 1 if p(x|1)> p(x|2); otherwise decide 1 • 2. p(x|1)=p(x|2) • Decide 1 if P(1) > P(2); otherwise decide 2

  10. Example R2 R1 P(1)=P(2)

  11. R2 R1 R2 R1 Example P(1)=2/3 P(2)=1/3 Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2

  12. Classification Error Consider two categories: Decide 1 if P(1|x) > P(2|x); otherwise decide 2

  13. Classification Error Consider two categories: Decide 1 if P(1|x) > P(2|x); otherwise decide 2

  14. Bayesian Decision Theory(Classification) Generalized Bayesian Decision Rule

  15. The Generation a set of c states of nature a set of a possible actions The loss incurred for taking action i when the true state of nature is j. Risk can be zero. We want to minimize the expected loss in making decision.

  16. Conditional Risk Given x, the expected loss (risk) associated with taking action i.

  17. 0/1 Loss Function

  18. Decision Bayesian Decision Rule:

  19. Overall Risk Decision function • Bayesian decision rule: • the optimal one to minimize the overall risk • Its resulting overall risk is called the Bayesian risk

  20. State of Nature Loss Function Action Two-Category Classification

  21. Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2

  22. Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2 positive positive Posterior probabilities are scaled before comparison.

  23. irrelevant Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2

  24. This slide will be recalled later. Two-Category Classification Threshold Likelihood Ratio Perform 1 if

  25. Bayesian Decision Theory(Classification) Discriminant Functions

  26. Action (e.g., classification) x How to define discriminant functions? The Multicategory Classification gi(x)’s are called the discriminant functions. g1(x) (x) g2(x) gc(x) Assign x to i if gi(x) > gj(x) for all j i.

  27. If f(.) is a monotonically increasing function, than f(gi(.) )’s are also be discriminant functions. Simple Discriminant Functions Minimum Risk case: Minimum Error-Rate case:

  28. Decision Regions Two-category example Decision regions are separated by decision boundaries.

  29. Bayesian Decision Theory(Classification) The Normal Distribution

  30. Basics of Probability Discrete random variable (X) -Assume integer Probability mass function (pmf): Cumulative distribution function (cdf): Continuous random variable (X) not a probability Probability density function (pdf): Cumulative distribution function (cdf):

  31. Expectations Let g be a function of random variable X. The kth moment The 1st moment The kth central moments

  32. Fact: Important Expectations Mean Variance

  33. Entropy The entropy measures the fundamental uncertainty in the value of points selected randomly from a distribution.

  34. p(x) μ x σ σ 2σ 2σ 3σ 3σ • Properties: • Maximize the entropy • Central limit theorem Univariate Gaussian Distribution X~N(μ,σ2) E[X] =μ Var[X] =σ2

  35. Random Vectors A d-dimensional random vector VectorMean: Covariance Matrix:

  36. Multivariate Gaussian Distribution X~N(μ,Σ) A d-dimensional random vector E[X] =μ E[(X-μ) (X-μ)T] =Σ

  37. Properties of N(μ,Σ) X~N(μ,Σ) A d-dimensional random vector Let Y=ATX, where A is a d × k matrix. Y~N(ATμ, ATΣA)

  38. Properties of N(μ,Σ) X~N(μ,Σ) A d-dimensional random vector Let Y=ATX, where A is a d × k matrix. Y~N(ATμ, ATΣA)

  39. On Parameters of N(μ,Σ) X~N(μ,Σ)

  40. More On Covariance Matrix  is symmetric and positive semidefinite. : orthonormal matrix, whose columns are eigenvectors of . : diagonal matrix (eigenvalues).

  41. Whitening Transform X~N(μ,Σ) Y=ATX Y~N(ATμ, ATΣA) Let

  42. Whitening Transform Whitening X~N(μ,Σ) Linear Transform Y=ATX Y~N(ATμ, ATΣA) Let Projection

  43. Mahalanobis Distance X~N(μ,Σ) r2 constant depends on the value of r2

  44. Mahalanobis Distance X~N(μ,Σ) r2 constant depends on the value of r2

  45. Bayesian Decision Theory(Classification) Discriminant Functions for the Normal Populations

  46. Minimum-Error-Rate Classification Xi~N(μi,Σi)

  47. Minimum-Error-Rate Classification Three Cases: Case 1: Classes are centered at different mean, and their feature components are pairwisely independent have the same variance. Case 2: Classes are centered at different mean, but have the same variation. Case 3: Arbitrary.

  48. Case 1. i = 2I irrelevant irrelevant

  49. Case 1. i = 2I

  50. Case 1. i = 2I Boundary btw. i and j

More Related