1 / 14

Document Similarity Measures

Document Similarity Measures. Content: Knowledge-based word semantic similarity shortest path similarity Leacock & Chodorow similarity Lesk similarity Wu & Palmer (Wu and Palmer, 1994) similarity metric Resnik ( Resnik , 1995) Information content based measure

minowa
Télécharger la présentation

Document Similarity Measures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Document Similarity Measures Content: Knowledge-based word semantic similarity shortest path similarity Leacock & Chodorow similarity Lesk similarity Wu & Palmer (Wu and Palmer, 1994) similarity metric Resnik (Resnik, 1995) Information content based measure measure introduced by Lin (Lin, 1998) Jiang & Conrath (Jiang and Conrath, 1997) measure of similarity Hirst & St. Onge (Hirst and St-Onge, 1998) measure of similarity Corpus-Based Measures of semantic similarity PointWise Mutual Information Normalized Google Similarity Distance Explicit Semantic Analysis

  2. Introduction

  3. Knowledge-based word semantic similarity

  4. Knowledge-based word semantic similarity

  5. Knowledge-based word semantic similarity

  6. Knowledge-based word semantic similarity

  7. Corpus-Based Measures of semantic similarity

  8. Corpus-Based Measures of semantic similarity

  9. Corpus-Based Measures of semantic similarity

  10. Corpus-Based Measures of semantic similarity Example: T1 = 2W1 + 3W2 + 5W3 T2 = 3W1 + 7W2 + W3 cos Ɵ = T1·T2 / (|T1|*|T2| = 0.6758 W3 T1 = 2W1 + 3W2 + 5W3 W1 T2 = 3W1 + 7W2 + W3 W2 Cosine Similarity

  11. Cosine Similarity: Example HurricaneGilbertswept toward the Dominican Republic Sunday , and the Civil Defense alerted its heavily populated south coast to prepare for high winds, heavy rains and high seas. The stormwas approaching from the southeast with sustained winds of 75 mphgusting to 92 mph . “There is no need for alarm," Civil Defense Director Eugenio Cabral said in a television alert shortly before midnight Saturday . Cabral said residents of the province of Barahona should closely follow Gilbert 's movement . An estimated 100,000 people live in the province, including 70,000 in the city of Barahona , about 125 miles west of Santo Domingo . Tropical StormGilbert formed in the eastern Caribbean and strengthened into a hurricaneSaturday night The National Hurricane Center in Miami reported its position at 2a.m. Sunday at latitude 16.1 north , longitude 67.5 west, about 140 miles south of Ponce, Puerto Rico, and 200 miles southeast of Santo Domingo. The National Weather Service in San Juan , Puerto Rico , said Gilbert was moving westward at 15 mph with a "broad area of cloudiness and heavy weather" rotating around the center of the storm. The weather service issued a flash flood watch for Puerto Rico and the Virgin Islands until at least 6p.m. Sunday. Strong winds associated with the Gilbert brought coastal flooding , strong southeast winds and up to 12 feet to Puerto Rico 's south coast.

  12. Cosine Similarity: Example (Document Vectors for selected terms) Document1 Gilbert: 3 Hurricane: 2 Rains: 1 Storm: 2 Winds: 2 Document2 Gilbert: 2 Hurricane: 1 Rains: 0 Storm: 1 Winds: 2 Cosine Similarity: 0.9439

  13. Explicit Semantic Analysis

  14. Reference • Michael Mohler, RadaMihalcea: Text-to-Text Semantic Similarity for Automatic Short Answer Grading. EACL 2009: 567-575. • Gabrilovich, E. and Markovitch, S. (2007). "Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis", Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, January 2007 • Claudia d'Amato, Steffen Staab, Nicola Fanizzi: On the Influence of Description Logics Ontologies on Conceptual Similarity. EKAW 2008: 48-63. • Similarity-based Learning Methods for the Semantic Web (http://www.di.uniba.it/~cdamato/PhDThesis_dAmato.pdf)

More Related