An Evaluation Procedure for Word Net Based Lexical Chaining: Methods and Issues

An Evaluation Procedure for Word Net Based Lexical Chaining: Methods and Issues Irene Cramer & Marc Finthammer Faculty of Cultural Studies, Technische Universität Dortmund, Germany irene.cramer@uni-dortmund.de

Outline • Project Context and Motivation • Lexical Chaining – Evaluation Steps • Preprocessing and Coverage • Sense Disambiguation • Semantic Relatedness/Similarity • Application • Open Issues and Future Work

Project Context and Motivation • Research project HyTex funded by DFG (German Research Foundation) – part of the research unit "Text Technological Modelling of Information" • Research objective in HyTex: text-grammatical foundations for the (semi-) automated text-to-hypertext conversion • One focus of our research: topic-based linking strategies using lexical and topic chains/topic views

Project Context and Motivation • Lexical chains • Topic chains/views • based on the concept of lexical cohesion, • regarded as partial text representation, • valuable resource for many NLP applications, such as text summarization, dialog systems etc. • (to our knowledge) 2 lexical chainers for German (Mehler, 2006 and Cramer/Finthammer), in additon: research on semantic similarity based on GermaNet by Gurevych et al.

Project Context and Motivation • Lexical chains • Topic chains/views • based on a selection of central words, so called topic words, • intended to support the user‘s orientation and navigation. Steps: • Lexical chains are used to select topic words (1-3 topic words per passage), topic words are used to construct the topic view (~"thematic index"), topic words are re-connected via semantically meaningful edges (as in lexical chaining) to construct topic chains

Project Context and Motivation Topic View Chapter 1.1 topic word 1 topic word 2 topic word 3 Chapter 1.2 topic word 1 topic word 2 topic word 3 Chapter 1.3 … Chapter 2 … Chapter 3.1 … Chapter 1.1 text topic word text text text text text text text text text text text text text text text text text text text text text topic word text text text text text text text text text text text text text text text text text text text text text text text text topic word text text text text text text text text text text text Chapter 1.2 text topic word text text text text text text text text text text text text text text text text text text text text text topic word text text text text text text text text text text text text text text text text text text text text text text text text topic word text text text text text text text text text text text

Lexical Chaining – Evaluation Steps • To evaluate our chainer, called GLexi, test data is required; • experiments to develop such gold standard for German emphasize: • manual annotation of lexical chains is very demanding, • rich interaction between various principles to achieve a cohesive text structure distracts annotators; • results of these experiments partially reported in Stührenberg et al., 2007. • Our conclusion: Evaluation of lexical chainer might be best performed in several steps.

Lexical Chaining – Evaluation Steps • Our conclusion: Evaluation of lexical chainer might be best performed in several steps. • Evaluation of coverage • Evaluation of disambiguation quality • Evaluation of semantic relatedness measures • Evaluation of chains wrt. specific application

Lexical Chaining – Evaluation Steps • Remainder of talk: • very short presentation of GLexi‘s architecture and • exemplary demonstration of applicability of evaluation procedure • Resources used: • GermaNet (version 5.0) • HyTex corpus (specialized text) – approx. 29,000 (noun) tokens • set of word pairs + results of human judgement experiment • German word frequency list (thanks to Sabine Schulte im Walde)

Lexical Chaining - GLexi Basic modules: • preprocessing of corpora • tokenization, POS tagging, chunking • chaining candidate selection • core algorithm • lexical semantic look-up, • scoring of relation, • sense disambiguation • output creation • rating of chain strength • application specific representation

Lexical Chaining - GLexi Preprocessing

Lexical Chaining - GLexi Core algorithm: lexical semantic look-up

Outline • Project Context and Motivation • Lexical Chaining – Evaluation Steps • Preprocessing and Coverage • Sense Disambiguation • Semantic Relatedness/Similarity • Application • Open Issues and Future Work

Step 1: Coverage • Coverage without preprocessing: approx. 56 % • Approx. 44 % not included in chaining •  preprocessing necessary to improve coverage!

Step 1: Coverage theoretical value – open issue e.g. Agrarproduktion (Engl. agricultural production)  Agrar (Engl. agricultural) + Produkt (Engl. artifact) + Ion (Engl. ion [chem.]) theoretical value – open issue e.g. Medien – Medium (Engl. media – psychic or data carrier) • Coverage without preprocessing: approx. 56 % • Lemmatization: increase coverage to approx. 71 % • Compound analysis: increase coverage to approx. 83 %  Simple Named Entity Recognition in preprocessing phase • Open issues: abbreviations, foreign words, nominalized verbs future work: include TermNet (domain specific language) as a resource – for more information: talk by Lüngen et al. – tomorrow, session 6, 10:40 h …

Step 2: Chaining-based WSD • Approx. 45 % of tokens in our corpus  more than 1 synset • Basis for chaining-based WSD: manual annotation • Compare manually annotated data and disambiguation decision of semantic measure

Step 2: Chaining-based WSD • Approx. 45 % of tokens in our corpus  more than 1 synset • Basis for chaining-based WSD: manual annotation • Compare manually annotated data and disambiguation decision of semantic measure best value therefore rank 1 compare with manual annotation

Step 2: Chaining-based WSD • Approx. 45 % of tokens in our corpus  more than 1 synset • Basis for chaining-based WSD: manual annotation • Compare manually annotated data and disambiguation decision of semantic measure compare with manual annotation

Step 2: Chaining-based WSD • Performance of chaining-based WSD: mediocre! • Best semantic measures (Resnik, Wu-Palmer, and Lin): • approx. 50-60% correct disambiguation compared to manual annotation • majority voting increased performance to approx. 63-65 % • Future work • include WSD in preprocessing? • machine learning based new measure?

Step 3: Semantic Relatedness • Implemented 11 semantic relatedness measures (SRM) (GermaNet: 8 measures, Google co-occurrence counts: 3 measures)  focus this talk: GermaNet measures • Evaluation of SRM performance used results of human judgement experiment: • list of 100 word pairs, subject’s judgement of "semantic distance" (35 subjects) on 5-level-scale • compare human judgement and SRM values

Step 3: Semantic Relatedness almost 2/3 extreme values (not related / strongly related)

Step 3: Semantic Relatedness Engl. fin Engl. printer human judgement experiment example of the results Engl. water Engl. fluid

Step 3: Semantic Relatedness all SRM values scatter correlation between human judgement and SRM values low

Step 3: Semantic Relatedness Open issues – semantic relatedness: • continuous SRM values necessary / helpful? instead: classes (e.g. 3 classes: not related, related, strongly related) machine learning (ML) experiment using parameters of SRM • interactions between SRM quality and disambiguation quality? • combination of GermaNet and Google co-occurrence based measures (and further resources) useful? integration in ML experiment?

Step 4: Application-oriented Evaluation Example: newspaper article about child poverty in Germany Topic words according to lexical chaining results Kind, Engl. child Geld, Engl. money Deutschland, Engl. Germany

Step 4: Application-oriented Evaluation • Features used in calculation of topic words and views: • chain / meta-chain info: • link density • link strength • in addition to chains: • frequency (relative passage and document) • mark-up • application-oriented evaluation  gold standard topic words, topic views, and topic chains necessary • manual annotation of topic words and topic views – work in progress – current annotation agreement > 75 % (before accordance) initial results show: link density and frequency are most relevant features

Outlook and Future Work To sum up: • Application: use lexical chaining for construction of topic views • Lexical chaining for German corpora: several challenges  coverage, disambiguation, SRM • room for improvement: disambiguation and SRM  possible solutions: • WSD as preprocessing step • alternative SRM (potentially ML based) • additional resources • initial results using lexical chains for construction of topic views  chaining useful!

Thank you! Comments, ideas, questions are very welcome.

Literature (back-up slide) • Alexander Budanitsky and Graeme Hirst. 2001. Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures. In Workshop on WordNet and Other Lexical Resources at NAACL-2000, Pittsburgh, PA, June 2001. • M. A. K. Halliday und Ruqaiya Hasan. 1976. Cohesion in English. Longman, London. • Graeme Hirst und David St-Onge. 1998. Lexical chains as representation of context for the detection and correction malapropisms. In C. Fellbaum, editor, WordNet: An electronic lexical database, chapter 13, pages 305–332. The MIT Press, Cambrige, MA. • Alexander Mehler. 2005. Lexical chaining as a source of text chaining. In Proceedings of the 1st Computational Systemic Functional Grammar Conference, Sydney. • Grogory H. Silber und Kathleen F. McCoy. 2002. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics, 28(4):487 – 496.

An Evaluation Procedure for Word Net Based Lexical Chaining: Methods and Issues