1 / 24

2. Mathematical Foundations

2. Mathematical Foundations. Foundations of Statistic Natural Language Processing. 2001. 7. 10. 인공지능연구실 성경희. Contents – Part 1. 1. Elementary Probability Theory Conditional probability Bayes’ theorem Random variable Joint and conditional distributions Standard distribution.

mili
Télécharger la présentation

2. Mathematical Foundations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2. Mathematical Foundations Foundations of Statistic Natural Language Processing 2001. 7. 10. 인공지능연구실 성경희

  2. Contents – Part 1 1. Elementary Probability Theory • Conditional probability • Bayes’ theorem • Random variable • Joint and conditional distributions • Standard distribution

  3. Conditional probability (1/2) • P(A) : the probability of the event A • Ex1> A coin is tossed 3 times. W = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} A = {HHT, HTH, THH} : 2 heads, P(A)=3/8 B = {HHH, HHT, HTH, HTT} : first head, P(B)=1/2 : conditional probability

  4. Conditional probability (2/2) • Multiplication rule • Chain rule • Two events A, B are independent If

  5. Bayes’ theorem (1/2) Generally, if and the Bi are disjoint Bayes’ theorem

  6. Bayes’ theorem (2/2) • Ex2> G : the event of the sentence having a parasitic gap T : the event of the test being positive • This poor result comes about because the prior probability of a sentence containing a parasitic gap is so low.

  7. Random variable • Ex3>Random variable X for the sum of two dice. Expectation : Variance : S={2,…,12} probability mass function(pmf) : p(x) = p(X=x), X ~ p(x) If X:W  {0,1}, then X is called an indicator RV or a Bernoulli trial

  8. Joint and conditional distributions • The joint pmf for two discrete random variables X, Y • Marginal pmfs, which total up the probability mass for the values of each variable separately. • Conditional pmf for y such that

  9. Standard distributions (1/3) • Discrete distributions: The binomial distribution • When one has a series of trials with only two outcomes, each trial being independent from all the others. • The number r of successes out of n trials given that the probability of success in any trial is p. : • Expectation : np, variance : np(1-p) where

  10. Standard distributions (2/3) • Discrete distributions: The binomial distribution

  11. Standard distributions (3/3) • Continuous distributions: The normal distribution • For the Mean m and the standard deviation s : Probability density function (pdf)

  12. Contents – Part 2 2. Essential Information Theory • Entropy • Joint entropy and conditional entropy • Mutual information • The noisy channel model • Relative entropy or Kullback-Leibler divergence

  13. Shannon’s Information Theory • Maximizing the amount of information that one can transmit over an imperfect communication channel such as a noisy phone line. • Theoretical maxima for data compression • Entropy H • Theoretical maxima for the transmission rate • Channel Capacity

  14. Entropy (1/4) • The entropy H (or self-information) is the average uncertainty of a single random variable X. • Entropy is a measure of uncertainty. • The more we know about something, the lower the entropy will be. • We can use entropy as a measure of the quality of our models. • Entropy measures the amount of information in a random variable (measured in bits). where, p(x) is pmf of X

  15. Entropy (2/4) • The entropy of a weighted coin. The horizontal axis shows the probability of a weighted coin to come up heads. The vertical axis shows the entropy of tossing the corresponding coin once. back 23 page p

  16. Entropy (3/4) • Ex7> The result of rolling an 8-sided die.(uniform distribution) • Entropy : The average length of the message needed to transmit an outcome of that variable. • For expectation E

  17. Entropy (4/4) • Ex8> Simplified Polynesian • We can design a code that on average takes bits to transmit a letter • Entropy can be interpreted as a measure of the size of the ‘search space’ consisting of the possible values of a random variable. bits

  18. Joint entropy and conditional entropy (1/3) • The joint entropy of a pair of discrete random variable X,Y~ p(x,y) • The conditional entropy • The chain rule for entropy

  19. p t k a i u p t k a i 0 u 0 1 Joint entropy and conditional entropy (2/3) • Ex9> Simplified Polynesian revisited • All words of consist of sequence of CV(consonant-vowel) syllables Marginal probabilities (per-syllable basis) Per-letter basis probabilities double back 8 page

  20. p t k a i 0 u 0 1 Joint entropy and conditional entropy (3/3)

  21. Mutual information (1/2) • By the chain rule for entropy • : mutual information • Mutual information between X and Y • The amount of information one random variable contains about another. (symmetric, non-negative) • It is 0 only when two variables are independent. • It grows not only with the degree of dependence, but also according to the entropy of the variables. • It is actually better to think of it as a measure of independence.

  22. Mutual information (2/2) • Since (entropy is called self-information) • Conditional MI and a chain rule =I(x,y) Pointwise MI

  23. Noisy channel model • Channel capacity : the rate at which one can transmit information through the channel (optimal) • Binary symmetric channel • since entropy is non-negative, go 15 page

  24. Relative entropy or Kullback-Leibler divergence • Relative entropy for two pmfs, p(x), q(x) • A measure of how close two pmfs are. • Non-negative, and D(p||q)=0 if p=q • Conditional relative entropy and chain rule

More Related