1 / 30

Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Presenter : Bo- Sheng Wang Authors : Majid Yazdani a,b ,* , Andrei Popescu-Belis a AI, 2013. Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Outlines. Motivation Objectives Methodology Em pirical analyses Experiments Conclusions

liv
Télécharger la présentation

Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presenter : Bo-Sheng Wang Authors : MajidYazdania,b,*, Andrei Popescu-Belisa AI, 2013 Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

  2. Outlines Motivation Objectives Methodology Empirical analyses Experiments Conclusions Comments

  3. Motivation • Existing measures of semantic relatedness based on lexicaloverlap, though widely used, are of little help when text similarity is not based on identicalwords.

  4. Objectives Therefore, they will computing text semantic relatedness based on concepts and their relations, which have linguistic as well as extra-linguistic dimensions, remains a challenge especially in the general domain and/or over noisy

  5. Methodology-build concept network • Concept • They removed all Wikipedia articles. • (Talk,File, Image, Template, Category, Portal, and List,) • Disambiguation pages were removed. • They set a cut-off limit of 100 non-stop words. • They extracted the corresponding anchor text and considered it as another possible secondary title for the linked article.

  6. Methodology

  7. Methodology-build concept network • Relatoins • They focus in the present study on the hyperlinks and links computed from similarity of content, of category. • we computed the lexical similarity between articles as the cosine similarity between the vectors derived from the articles’ texts, after stopword removal and stemming using Snowball.

  8. Methodology

  9. Methodology-VP

  10. Methodology-VP to weighted sets of concepts and to texts

  11. Methodology-Approximation

  12. Methodology-Approximation • T–truncated • ε-truncated

  13. Methodology-Learning embedding

  14. Empirical analyses Convergence of the T-truncated

  15. Empirical analyses Convergence of ε-truncated

  16. Empirical analyses

  17. Experiments Average training error

  18. Experiments Average training error

  19. Experiments Word Similarity

  20. Experiments Word Similarity

  21. Experiments

  22. Experiments Document similarity

  23. Experiments Document clustering

  24. Experiments Comparison of VP and cosine similarity

  25. Experiments Text classification

  26. Experiments

  27. Experiments

  28. Experiments

  29. Conclusions

  30. Comments • Advantages • Disadvantage • Applications • Text categorization

More Related