1 / 49

Evaluation of Association Measures

Evaluation of Association Measures. Want to. identify the practical feasibility of a certain AM for identifying collocations which types of collocation which corpora (domain, size) high frequency versus low frequency data compare the outcomes of different association measures. We have

yonah
Télécharger la présentation

Evaluation of Association Measures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Association Measures

  2. Want to • identify the practical feasibility of a certain AM for identifying collocations • which types of collocation • which corpora (domain, size) • high frequency versus low frequency data • compare the outcomes of different association measures

  3. We have • differently ranked collocation candidates • We need • true collocation data for comparison, e.g • collocation lexica • list of true collocations occurring in the extraction corpus

  4. Problems & Inconveniences using collocation lexica for evaluation • will not tell us how well an AM worked on a particular corpus • it only tells us that • some of the reference collocations also occur in in our base data and • the AM has found them

  5. Problems & Inconveniences Using a list of true collocations occurring in the extraction corpus • requires a good deal of hand-annotation • requires “objective” criteria for the distinction of collocational and noncollocational word combinations in our candidate list

  6. Our Approach • Evaluation of lexical association measures AMs against a manually identified reference corpus of true collocations (TPs) • Evaluation based on the full reference set • Precise, linguistically motivated definition of TPs • The evaluation of results based on recall and precision graphs

  7. For Further Discussion • Testing for significance of AMs is an important but still open question • There is a potential for fine-tuning of AMs given a specific data set and a particular type of collocations to be extracted(Krenn, Evert 2001)

  8. Evaluation Experiments

  9. Data Extraction corpora • newspaper: 8 million wordsFrankfurter Rundschau Corpus(ECI Multilingual Corpus 1) • newsgroup: 10 million words FLAG corpus (LT-DFKI)

  10. Data • Base data: • list of PP-verb pairs ~ (PN,V)-combinations • Collocation types: • support verb constructions FVG • figurative expressions figur

  11. Examples

  12. Support Verb Constructions FVG • verb-object collocation • function as predicates • can be paraphrased by main verbs • NP-verb or PP-verb • verbal collocate (function verb / light verb / support verb) • main verb • conveys Aktionsart and causativity

  13. Support Verb Constructions FVG • nominal collocate • abstract noun • often de-verbal or de-adjectival • contributes the core meaning • (prepositional collocate) • verbal and nominal collocate together determine the argument structure of the collocation

  14. FVG Examples

  15. FVG Examples

  16. Figurative Expressionsfigur • not restricted to NP/PP-verb • figurative reinterpretation of literal meaning required(e.g., unter die Haut gehen (get under ones skin) • nouns: conrete • verbs: often causative-noncausative alternation e.g., auf Eis legen (put on ice)auf Eis liegen (be on ice)

  17. Decision Tree:FVG versus figur

  18. Frequency Distributions

  19. Frequency Distributions

  20. Frequency Distributions

  21. newsgroup f >= 5 FVG,figur newspaper f >= 3 FVG,figur newspaper f >= 3 figur newspaper f >= 3 FVG Combination of Properties in the Candidate Lists

  22. Candidate pair t-score ab Dienstag bietetab Donnerstag bietetab Freitag bietetab Jahren beginntab Jahren bietetab Jahren eingeladenab Jahren geeignetab Jahren heißtab Jahren käthiab Jahren tanzenab Jahren treffenab Juni restauriertab Mark findenab Mark kostetab Mark zu_findenab März bietet+anab Mittwoch bietetab Notierungen nutzenab Notierungen zu_nutzenab November einladen... 1.9921.9921.9861.5781.6521.6722.4401.5961.7311.9922.9471.9991.7051.7171.7191.7231.7241.9981.9992.449... Evaluation Procedure Source Corpus candidate list

  23. Rank Candidate pair t-score 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.... um Uhr beginntbis Uhr geöffnetzur Verfügung stehenzur Verfügung gestelltzur Verfügung stellenums Leben gekommenzur Verfügung stehtauf Programm stehenin Anspruch genommenauf Tagesordnung stehenam Dienstag sagteam Montag sagteauf Seite lesenauf Kürzungen behält vorauf Programm stehtim Mittelpunkt stehtin Regionalausgabe erscheintan Stelle meldenauf Seite zeigenzur Verfügung zu_stellen... 19.21813.52312.75111.72411.14710.46510.0089.7829.7009.4738.9788.9318.6138.6008.4238.3958.2988.2898.2828.269... Evaluation Procedure significance list

  24. 11true positives Rank Candidate pair t-score 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.... um Uhr beginntbis Uhr geöffnetzur Verfügung stehenzur Verfügung gestelltzur Verfügung stellenums Leben gekommenzur Verfügung stehtauf Programm stehenin Anspruch genommenauf Tagesordnung stehenam Dienstag sagteam Montag sagteauf Seite lesenauf Kürzungen behält vorauf Programm stehtim Mittelpunkt stehtin Regionalausgabe erscheintan Stelle meldenauf Seite zeigenzur Verfügung zu_stellen... 19.21813.52312.75111.72411.14710.46510.0089.7829.7009.4738.9788.9318.6138.6008.4238.3958.2988.2898.2828.269... 9false positives Evaluation Procedure: N-best Lists • precision:11/20 = 55% total: 1280 TPs

  25. Precision Graph:PNV full forms

  26. Base Line:Random Selection

  27. Precision Graphs

  28. Precision Graphs

  29. Precision Graphs

  30. Recall Graphs

  31. Precision/Recall

  32. Precision Graphs:Newspaper, FVG + figur

  33. Precision Graphs: Newspaper FVG figur

  34. Precision Graphs:AdjN

  35. Precision Graphs:AdjN

  36. Precision/Recall:AdjN

  37. Frequency Layers: AdjN Data f  5 2  f < 5

  38. Frequency Layers: PNV Data f  10 3  f < 5

  39. Lemmas vs. Word Forms (PNV) lemmas f  3 word forms f  3

  40. Text Type and Domain (PNV) newspaper news group discussions comparison for non-lemmatised candidates

  41. The MI Mystery (FVG) region of high "local precision" for 4.0 < MI < 7.5

  42. Further particularities of the newspaper data • candidates with MI > 7.5 are more frequent than expected under independence assumption • but very few FVG among them • data do not support the counter-MI argument of overestimation of data with low-frequency joint and marginal distributions

  43. optimized MI • | MI - 5.75 | • account for the FVG concentration • among 4.0<= MI >= 7.5 • in the newspaper test data

  44. Summary of Results • Best measures: • t-score / frequency best for identifying PP-verb collocations (FVG, figur) • log-likelihood, t-score, Fisher, binominal and multinominal p value work well for AdjN

  45. Summary of Results • Reproducibility of results for different text types: • Precision results from newsgroup data comparable to newspaper data • Strong evidence that identical classes of collocations are similarly distributed in different types of corpora

  46. Summary of Results • Differences in suitability of AMs to identify particular collocation types: • (PN,V)-candidates with high MI score are lesslikely to be FVG • Log-likelihood not well suited for identifying FVG • but better suited for identifying figur

  47. Summary of Results • Experimental results based either on a small number of best-scoring candidates or on more than the first 50 % of the SLs are unreliable

  48. Conclusion on AMs Optimal results do not necessarily come from a statistical discussion but from tuning on a particular data set

  49. Vast Land:Lowest-frequency Data • lowest-frequency data (hapax legomena, dis legomena, ...) are a serious challenge for all statistical approaches • typical solution: cut-off thresholds • Evert/Krenn used cut-off thresholds in evaluation to reduce manual annotation work • need to estimate number of TPs among excluded lowest-frequency candidates

More Related