1 / 14

Probabilistic Text Generation

Probabilistic Text Generation. A Nifty Assignment from Joe Zachary School of Computing University of Utah. Probabilistic Text Generation. Based on an idea by Claude Shannon (1948) popularized by A.K. Dewdney (1989) Generates probabilistic text based on the patterns in a source file

philantha
Télécharger la présentation

Probabilistic Text Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Text Generation A Nifty Assignment from Joe Zachary School of Computing University of Utah

  2. Probabilistic Text Generation • Based on an idea by Claude Shannon (1948) popularized by A.K. Dewdney (1989) • Generates probabilistic text based on the patterns in a source file • Both fun and appropriate for CS 2 students • Guess which one is actually from the text • which means the other two are random

  3. King James Bible For every man putteth one and my mother: for them, to David sent Samson, and a sacrifice. Then went Samson down, and his father and his mother, to Timnath, and came to the vineyards of Timnath: and, behold, a young lion roared against him. Now the ark; and treasures upon them: and, Who also in the evil: so do your heart?

  4. King James Bible nGram length == 6 For every man putteth one and my mother: for them, to David sent Samson, and a sacrifice. Then went Samson down, and his father and his mother, to Timnath, and came to the vineyards of Timnath: and, behold, a young lion roared against him. Now the ark; and treasures upon them: and, Who also in the evil: so do your heart?

  5. Tom Sawyer Huck started to act very intelligently on the back of his pocket behind, as usual on Sundays. He was always dressed fitten for drinking some old empty hogsheads. The men contemplated the treasure awhile in blissful silence.

  6. Tom Sawyer nGram length == 6 Huck started to act very intelligently on the back of his pocket behind, as usual on Sundays. He was always dressed fitten for drinking some old empty hogsheads. The men contemplated the treasure awhile in blissful silence.

  7. Hamlet Ay me, what act, That roars so loud and thunders in the index? Worse that a rat? Dead for a ducat, drugs fit that I bid you not? Leave heart; for to our lord, it we show him, but skin and he, my lord, I have fat all not over thought, good my lord?

  8. Hamlet nGram length == 5 Ay me, what act, That roars so loud and thunders in the index? Worse that a rat? Dead for a ducat, drugs fit that I bid you not? Leave heart; for to our lord, it we show him, but skin and he, my lord, I have fat all not over thought, good my lord?

  9. Niftiness • Not a toy: it slurps up entire books • Defies expectations: it turns out to be both straightforward and educational • Entertaining: I (Joe Zachary) run a contest to find the funniest generated text

  10. nGram length = 0 The probability that c is the next character to be produced equals the probability that c occurs in the source file. rla bsht eS ststofo hhfosdsdewno oe wee h .mr ae irii ela iad o r te u t mnyto onmalysnce, ifu en c fDwn oee iteo

  11. nGram length ==1 Let s be the previously produced character. The probability that c is the next character to be produced equals the probability that c follows s in the source text. "Shand tucthiney m?" le ollds mind Theybooure He, he s whit Pereg lenigabo Jodind alllld ashanthe ainofevids tre lin--p asto oun

  12. nGram length == 3..15 Let nGram be the previously produced k (4 in this case) characters. The probability that c is the next character to be produced equals the probability that c follows nGram in the source text. Mr. Welshman, but him awoke, the balmy shore. I'll give him that he couple overy because in the slated snufflindeed structure's

  13. Algorithm • Pick a random k-letter nGram from the text • Repeatedly: • Make a list of every character that follows the nGram in the text • Randomly pick a character c from the list • Print c • Remove the first character from the seed and append c

  14. Example nGram length == 2 Seed: th Text: We hold these truths to be self-evident: that all men are created equal; that they List: [e, s, a, a, e] Character: s (20% of the time) New nGram: hs

More Related