1 / 44

Natural Language Processing

Natural Language Processing. Yoav Goldberg Computer Science Department Presented in Academic Writing in English course. Please try and make your own presentations for dummies as well. What is a Natural Language?.

Télécharger la présentation

Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations for dummies as well

  2. What is a Natural Language? • Natural Languages are languages of humans(such as Hebrew, English, Arabic, Hindi, Latin..) • These can be either written or spoken

  3. What is Natural Language Processing? • A subfield of Computer Science

  4. What is Natural Language Processing? • A subfield of Computer Science • 20-30 years ago:

  5. What is Natural Language Processing? • A subfield of Computer Science • 20-30 years ago: “NLP is about making the computer understand natural language”

  6. What is Natural Language Processing? • A subfield of Computer Science • 20-30 years ago: “NLP is about making the computer understand natural language” But today we know that: • Language is HARD • Computers are STUPID

  7. What is Natural Language Processing? • A subfield of Computer Science • 20-30 years ago: “NLP is about making the computer understand natural language” But today we know that: • Language is HARD • Computers are STUPID  The computer will never understand language

  8. Language is Hard הרכבת

  9. Language is Hard הרכבת הרכבת המהירה לחיפה הרכבת הממשלה הרכבת את הפאזל הרכבת על הסוס?

  10. Language is Hard כפיות

  11. Language is Hard • I play the Bass • I hate banks

  12. Language is Hard • I play the Bass • Some people like Bass fishing • I hate banks • River banks are fun places

  13. Some of the next ones might be hard also for humans!

  14. Language is Hard • Thin people eat candy

  15. Language is Hard • Thin people eat candy • Fat people eat candy

  16. Language is Hard • Thin people eat candy • Fat people eat candy • Fat people eat steaks

  17. Language is Hard • Thin people eat candy • Fat people eat candy • Fat people eat steaks • Fat people eat accumulates

  18. Language is Hard • Flying planes are dangerous • Flying planes is dengerous

  19. Language is Hard • I saw a man on the hill with a telescope

  20. Language is Hard • I saw a man on the hill with a telescope Who has the telescope? Who is on the hill?

  21. Language is Hard • I saw a man on the hill with a telescope Who has the telescope? Who is on the hill? (and this takes for granted that the sentence is not about a very cruel way of killing someone)

  22. Ok, so I hope I convinved you language is hardAnd these examples didn’t even touch the subject of what understanding is all about!

  23. So computers will never understand language

  24. So computers will never understand languageHow will I ever finish my thesis??

  25. Fortunately, we can go a long way by cheating

  26. So what is Natural Language Processing today? • Natural language processing is about making computer programs that can do seemingly intelligent things with Natural Language input • Or, in other words, finding devious ways of cheating people to think the computer can understand language to some extent

  27. Lies, Damn Lies, and Statistics • One of our main cheating tools is Statistics • Let me demonstrate

  28. For example: • Humans know that: I have a spelling checker • Makes far more sense than: Eye halve a spelling chequer Can computers do that?

  29. Example (cont.) • “Make Sense” is hard, but we can cheat by changing the question: Which of the following is More Probable? “I have a spelling checker” “Eye halve a spelling chequer”

  30. Example (cont.) • This is still hard, but we can cheat yet again by asking several easier questions What’s the probability of seeing: halve after Eye ? a after halve ? spelling after a ? chequer after spelling ? have after I ? a after have ? spelling after a ? checker after spelling ?

  31. Example (cont.) • This is still hard, but we can cheat yet again by asking several easier questions What’s the probability of seeing: halve after Eye ? a after halve ? spelling after a ? chequer after spelling ? have after I ? a after have ? spelling after a ? checker after spelling ? (we are assuming every words depends only on the word preceding it. This is ofcourse wrong.)

  32. Example (cont.) Seeing halve after Eye: P(halve | Eye)

  33. Example (cont.) Seeing halve after Eye: P(halve | Eye) = count(Eye halve) / count(Eye)

  34. Example (cont.) Seeing halve after Eye: P(halve | Eye) = count(Eye halve) / count(Eye) = 14,600 / 301,000,000 = 4.85e-5

  35. Example (cont.) Seeing halve after Eye: P(halve | Eye) = count(Eye halve) / count(Eye) = 14,600 / 301,000,000 = 4.85e-5 In the same manner: P( a | halve ) = 0.0033 P( have | I ) = 0.19 P( spelling | a) = 1.5e-4 P ( a | have ) = 0.45 P( chequer | spelling ) = 2.55e-4 P ( checker | spelling ) = 0.012

  36. Example (cont) • Combining the probabilities, we can estimate: P(“Eye halve a spelling chequer”) P(“I have a spelling checker”)

  37. Example (cont) • Combining the probabilities, we can estimate: P(“Eye halve a spelling chequer”)  6.12e-15 P(“I have a spelling checker”)  1.53e-7

  38. Example (cont) • Combining the probabilities, we can estimate: P(“Eye halve a spelling chequer”)  6.12e-15 P(“I have a spelling checker”)  1.53e-7 Yep, I have a spelling checkermakes far more sense.

  39. What else can we do? • Tell if a certain article is from the Washington Post or the New York Times • Find the most informative sentence in a paragraph • Categorize texts into subjects (e.g. sports, economics, literature, religion) • Tell that 2 news items are about the same event • Answer factual questions (When did Beethoven die?) • Divide sentences into meaningfull units [Pierre Vinken], [61 years old], [will join] [the board of directors] [next Sunday] And much much more..

  40. And – what I’m interested in: Boundaries disambiguation of coordinated conjunctions of NPs

  41. And – what I’m interested in • I work on Ands. • More specifically, I’m trying to figure out the boundaries of the things joined by Ands. I ate green apples and juicy bananas for lunch

  42. And – what I’m interested in • Which of the following makes most sense? I ate green (apples and juicy bananas) for lunch I ate (green apples and juicy bananas) for lunch I ate green (apples and juicy)bananas for lunch I (ate green apples and juicy) bananas for lunch …

  43. And? • This is very easy for people • This is very hard for computers My main intuitions: People are joining similar things When they do so, they tend to use similar structure Switching between the joined things is usually allowed

  44. Thanks Questions?

More Related