50 likes | 172 Vues
Identifying terms with similar meanings across corpora . Sahami and Heilman. UN Secretary General. Kofi Annan. Google(UN Secretary General). Google(Kofi Annan). My Project. ForeignAffairs(Kofi Annan). Google(Kofi Annan). BioDatabase(Python). Google(Python). Main Program. Google Search
E N D
Sahami and Heilman UN Secretary General Kofi Annan Google(UN Secretary General) Google(Kofi Annan) My Project ForeignAffairs(Kofi Annan) Google(Kofi Annan) BioDatabase(Python) Google(Python)
Main Program Google Search API Web Lucene Pre-computed IDFs
Best Results So Far • IMDB • “Apocalypse Now” and “Gothika” clearly identified as popular. • “The Body”, “Summer School”, “Antitrust” clearly identified as… overshadowed by other meanings. • Compound identification (actor names, etc.) would probably be a big help here.
References • Sahami, M. and Heilman, T. D. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM Press, New York, NY, 377-386.DOI= http://doi.acm.org/10.1145/1135777.1135834