Lecture 24 Distributiona l based Similarity II

Lecture 24Distributional based Similarity II CSCE 771 Natural Language Processing • Topics • Distributional based word similarity • Readings: • NLTK book Chapter 2 (wordnet) • Text Chapter 20 April 10, 2013

Overview • Last Time (Programming) • Examples of thesaurus based word similarity • path-similarity – memory fault ; sim-path(c1,c2) = -log pathlen(c1,c2)nick, Lin • extended Lesk – glosses of words need to include hypernyms • Today • Distributional methods • Readings: • Text 19,20 • NLTK Book: Chapter 10 • Next Time: Distributional based Similarity II

Figure 20.8 Summary of Thesaurus Similarity measures • Elderly moment IS-A memory fault IS-A mistake • sim-path correct in table

Example computing PPMI • Need counts so lets make up some • we need to edit this table to have counts

Associations • PMI-assoc • assocPMI(w, f) = log2 P(w,f) / P(w) P(f) • Lin- assoc - f composed of r (relation) and w’ • assocLIN(w, f) = log2 P(w,f) / P(r|w) P(w’|w) • t-test_assoc (20.41)

Figure 20.10 Co-occurrence vectors • Dependency based parser – special case of shallow parsing • identify from “I discovered dried tangerines.” (20.32) • discover(subject I) I(subject-of discover) • tangerine(obj-of discover) tangerine(adj-mod dried)

Figure 20.11 Objects of the verb drink Hindle 1990

vectors review • dot-product • length • sim-cosine

Figure 20.12 Similarity of Vectors

Fig 20.13 Vector Similarity Summary

Figure 20.14 Hand-built patterns for hypernyms Hearst 1992

Figure 20.15

Figure 20.16

http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltk • NLTK 3.0a1 released : February 2013 • This version adds support for NLTK’s graphical user interfaces. http://nltk.org/nltk3-alpha/ • which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words? • I want use a function for word clustering and yarowskyalgorightm for find similar collocation in a large text. • http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguistics • http://en.wikipedia.org/wiki/Portal:Linguistics • http://en.wikipedia.org/wiki/Yarowsky_algorithm • http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html

Lecture 24 Distributiona l based Similarity II

Lecture 24 Distributiona l based Similarity II

Presentation Transcript

Gene Prediction: Similarity-Based Approaches

Lecture 24

Lecture 24

Physiologically Based Pharmacokinetics – Lecture II

On Link-based Similarity Join

Lecture 24

Feature Based Similarity

ECE291 Computer Engineering II Lecture 24

Feature Based Similarity

Lecture 24 Distributional Word Similarity II

Gene Prediction: Similarity-Based Approaches Lecture 23

Content-Based Similarity Search

ECE291 Computer Engineering II Lecture 24

NOVEL II Lecture 24

Lecture 22 Word Similarity

Lecture 24

Similarity based deduplication

Lecture 22 Word Similarity

Lecture 24

NOVEL II Lecture 24