Overview of Peter D. Turney’s Work on Similarity

Overview of Peter D. Turney’s Work on Similarity From 2001-2008

similarity • Attributional similarity (2001 - 2003) • the degree to which two words are synonymous • also known as • Semantic relatedness and semantic association • Relational similarity (2005 - 2008) • the degree to which two relations are analogous

Objective evaluation of the approaches by • Attributional similarity • 80 TOFEL Synonym questions • Relational similarity • 374 SAT analogy questions

2001Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL In Proceedings of the 12th European Conference on Machine Learning, pages 491–502, Springer, Berlin, 2001.

1 Introduction • 识别同义词： • 给定一个词和一组候选词，从候选词中选出与给定词意义最相近的一个。 • 核心思想：基于co-occurrence • “a word is characterized by the company it keeps”

1 Introduction: idea • 给定一个词problem和一组候选词{choice1, choice2, …, choicen} • 计算choicei的score(choicei)，得分最高的即为同义词。 • uses Pointwise Mutual Information (PMI) • to analyzestatistical data collected by Information Retrieval (IR).

2 formula • Score 1: • Score 2: NEAR为十个单词以内

2 formula • Score 3: 避免反义词如big vs. small • Score 4: 引入上下文context • context word的选择：只选一个（保证样本数）

3 Experiments • Compare with • LSA: Latent Semantic Analysis • 利用百科全书构造初始矩阵X：61,000 * 30,473 • 文档片段：整篇文档 • 压缩降维：SVD • Element: tfidf weight • Similarity: cosine • 学生的TOFEL成绩

Dataset: • 80个TOFEL试题 • 50个ESL考试题

3 Experiments: PMI-IR Vs. LSA • 时间效率 • PMI-IR：程序简单，耗时少 • 2s/query * 8 querys，几乎全部耗时在网络交互 • 并行：2S • LSA：耗时长 • 61,000 * 30,473压缩到61,000 *300，UNIX Station需时大约三小时

3 Experiments • 80个TOFEL试题， 50个ESL考试题 • PMI-IR： 73.75%(59/80) 74%(37/50) • 留学生： 64.5%(51.6/80) • LSA: 64.4%(51.5/80) • 性能: PMI-IR WIN: 10% • 原因 • NEAR的使用，Smaller chunk size • LSA 64.4% • PMI-IR with AND 62.5% • PMI-IR with NEAR 72.5%

4 Conclusion • 结合PMI和IR • 用共现来衡量词语间的相关程度 • PMI • 利用向引擎发送查询 • 解决了数据稀疏的问题

2003Combining independent modules in lexical multiple-choice problems In RANLP-03, pages 482–489, Borovets, Bulgaria (RANLP: Recent Advances in Natural Language Processing )

1 Introduction • There are several approaches to natural language problems • No one will be the best for all problem instances. • How about combine them?

1 Introduction • two main contributions • introduces and evaluates several new modules • for answering multiple-choice synonym questions and analogy questions. • 3 merging rules • presents a novel product rule • compares it with other 2 similar merging rules.

2 Merging rules: the parameter • The parameter of the rules: w • phij >= 0 represents the probability • 第 i 个module 1 <= i <= n • 第 h 个 instance 1 <= h <= m. • 第 j 个choice 1 <= j <= k • Dh,wjbe the probability • assigned by the merging rule to choice jof training instance h when the weights are set to w. • 1<= a(h) <= k be the correct answer for instance

2 Merging rules: old • mixture rule: very common 归一化 • logarithmic rule

2 Merging rules: novel • product rule

3 Synonym: dataset • a training set of 431 4-choice synonym questions • randomly divided them into 331 training questions and 100 testing questions. • Optimize w with the training set

3 Synonym: Modules • LSA • PMI-IR • Thesaurus • queries Wordsmyth (www.wordsmyth.net) • Create synonyms lists for both stem and choices • scored them by their overlap • Connector • used summary pages from querying Google with a pair of words • Weighted sum of • the times when the words appear separated by a symbol • [, ”, :, ,, =, /, ,, (, ] • means, defined, equals, synonym, whitespace, and • the number of times “dictionary” or “thesaurus” appear

3 Synonym: combine results • 3 rules’ accuracies are nearly identical • the product and logarithmic rules assign higher probabilities to correct answers • as evidenced by the mean likelihood.

3 Synonym: compare with other approaches

4 Analogies: dataset • 374 5-choice instances • randomly split the collection into 274 training instances and 100 testing instances. • Eg. cat:meow:: (a) mouse:scamper, (b) bird:peck, (c) dog:bark, (d) horse:groom, (e) lion:scratch

4 Analogies: modules • Phrase vectors • Create vector r to present the relationship between X and Y. • Phrases with 128 patterns • Eg. “X for Y", “Y with X", “X in the Y", “Y on X“ • Query and record the number of hits • Measure by cosine • Thesaurus paths (WordNet) • degree of similarity between paths

4 Analogies: combine results • Lexical relation modules • a set of more specific modules using the WordNet • 9 modules: Each checks a relationship • Synonym, Antonym, Hypernym, Hyponym, Meronym:substance, Meronym:part, Meronym:member, Holonym:substance, Holonym:member. • Check the stem first, then the choices • Similarity • Make use of definition • Similarity:dict uses dictionary.com and • Similarity:wordsmyth uses wordsmyth.net • Given A:B::C:D, similarity = sim (A, C) + sim (B, D)

5 Conclusion • applied three trained merging rules to TOEFL questions • Accuracy: 97.5% • provided first results on a challenging analogy task with a set of novel modules that use both lexical databases and statistical information. • Accuracy: 45% • the popular mixture rule was consistently weaker than the logarithmic and product rules at assigning high probabilities to correct answers.

State of the art (accuracy)

2005 Corpus-based Learning of Analogies and Semantic Relations IJCAI 2005 Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30-August 5, 2005.

1 Introduction • Verbal analogy: VSM • A:B :: C:D • The novelty of the paper is the application of VSM to measure the similarity between relationships. • Noun-modifier pairs relations: supervised nearest neighbour algorithm • Dataset: Nastase and Szpakowicz (2003), 600 none-modifier pairs.

1 Introduction: examples • Analogy • Noun-modifier pairs relations • Laser printer • Relation: instrument

2 Solving Analogy Problems • assign scores to candidate analogies A:B::C:D • For multiple-choice questions, guess highest scoring choice • Sim(R1, R2) • difficulty is that R1 and R2 are implicit • attempt to learn R1 and R2 using unsupervised learning from a very large corpus

2 Solving Analogy Problems: Vector Space Model • create vectors, r1 and r2, that represent features of R1 and R2 • measure the similarity of R1 and R2 by the cosine of the angle θ between r1 and r2

2 Solving Analogy Problems:简易图解版 • Generate vector for each word pair Joining terms: “X for Y", “Y with X", “X in the Y", “Y on X“ vector [ log(hit1), log(hit2)…, log(hit128) ] 64 joining terms search phrases hits log Word Pair A:B vector

2 Solving Analogy Problems: experiment

3 Noun-Modifier Semantic Relations • First attempt to classify semantic relations without a lexicon.

30 Semantic Relations of training data

3 Noun-Modifier Semantic Relations: algorithm • nearest neighbour supervised learning • nearest neighbour = cosine • Cosine (training pair, testing pair) • vector of 128 elements, same joining terms as before

3 Noun-Modifier Semantic Relations:Experiment for the 30 Classes

30 Semantic Relations • F when precision and recall are balanced • 26.5% • F for random guessing • 3.3% • much better than random guessing • but still much room for improvement • 30 classes is hard • too many possibilities for confusing classes • try 5 classes instead • group classes together

5 Semantic Relations

F for the 5 Classes

5 Semantic Relations • F when precision and recall are balanced • 43.2% • F for random guessing • 20.0% • better than random guessing • better than 30 classes • 26.5% • but still room for improvement

Execution Time • experiments presented here required 76,800 queries to AltaVista • 600 word pairs • × 128 queries per word pair • = 76,800 queries • as courtesy to AltaVista, inserted a five second delay between each query • processing 76,800 queries took about five days

Conclusion • The cosine metric in the VSM used to • Analogy • Classify semantic relations • It performs much better than random guessing, but below human levels.

State of the art

2006aSimilarity of Semantic Relations Computational Linguistics, 32(3):379–416.

1 Introduction • Latent Relational Analysis (LRA) • LRA extends the VSM approach of Turney and Littman (2005) in three ways: • The connecting patterns are derived automatically from the corpus, instead of using a fixed set of patterns. • Singular Value Decomposition (SVD) is used to smooth the frequency data. • automatically generated synonyms are used to explore variations of the word pairs.

Overview of Peter D. Turney’s Work on Similarity

Overview of Peter D. Turney’s Work on Similarity

Presentation Transcript

Feature Similarity

Feature Based Similarity

Similarity in CBR

Musical Similarity: More perspectives and compound techniques

Overview

Efficient Parallel Set-Similarity Joins Using Hadoop

Proving Triangles are Similar

7-6 Dilations and Similarity in the Coordinate Plane

Topic 1 Outline

Overview

Isosurface Similarity Map

Nick Turney

VERTa : Linguistic Features in MT Evaluation

Introduction 1. Similarity 1.1. Mechanism and mathematical description

PETER THE WORKER!

Java Overview

First Peter: an overview

Concept

Distributional word Similarity

Similarity, Right Triangles, and Trigonometry

Learning Embeddings for Similarity-Based Retrieval

7.3 Triangle Similarity