1 / 30

Mapping Between Taxonomies

Mapping Between Taxonomies. Elena Eneva 11 Dec 2001 Advanced IR Seminar. Mapping Between Taxonomies. Formal systems of orderly classification of knowledge, which are designed for a specific purpose

dane-cote
Télécharger la présentation

Mapping Between Taxonomies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar

  2. Mapping Between Taxonomies • Formal systems of orderly classification of knowledge, which are designed for a specific purpose • Companies, organizing information in various ways (eg. one for marketing, another for product development)

  3. German Textile Approach French Automobile By country By industry

  4. German Textile Approach French Automobile By country By industry

  5. German Textile Approach French Automobile By country By industry

  6. German Textile Approach French Automobile By country By industry

  7. Textile Approach Automobile By industry

  8. abc abc abc abc abc abc Textile Approach Automobile abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc By industry

  9. Textile Approach Automobile abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc By industry

  10. German Textile Approach French Automobile By country abc abc abc abc By industry

  11. German Textile Approach French Automobile By country abc abc abc abc By industry

  12. German Textile Approach French Automobile By country abc abc abc abc By industry abc abc abc abc

  13. Datasets Two classification schemes: • Reuter 2001 (807900 docs) • Topics (127) • Industry categories (871) • Regions (376) • Hoovers-255 and Hoovers-28 (4286 docs) • industry categories (28) • industry categories (255)

  14. Learning • 2 separate methods of learning for the documents: • Old doc category -> new doc category • Doc contents -> new category • Combined method: • Weighted average based on confidence • Final result determined by a decision tree • One combined learner – used both old category and contents as features

  15. Simple Learners • Simple Decision Tree (C4.5) – learns probabilities of new categories based on 1 kind of feature: • Old categories (doesn’t know about documents/words) • Word-based classification (doesn’t know about old categories) • Naïve Bayes (rainbow) • Old categories (doesn’t know about documents/words) • Word-based classification (doesn’t know about old categories) • Support Vector Machine (SVM-Light) • word-based classification (doesn’t know about old categories), linear kernel [results will be reported in the final paper]

  16. Learning DT, NB, SVM abc • Using the document content abc abc abc abc abc DT, NB, SVM • Using the document labels

  17. Combined Learners • Weighted Average • Voting scheme • Combination Decision Tree • takes the outputs and confidences of two of the simple learners, predicts new category

  18. abc abc abc abc abc abc abc abc abc abc DT abc abc voting DT, NB, SVM DT, NB, SVM 3rd classifier Learning • Using both the content and the label • Combining the two outputs

  19. Results Words Only • 5-fold cross validation

  20. Results Categories Only • 5-fold cross validation

  21. Results Combination • 5-fold cross validation

  22. Results

  23. Remarks • Hierarchy (old classes) usually ignored • Shown that helps • Learners are not the issue • Better way of understanding • Old label (or hierarchy path) is meta data

  24. Remaining Work • SVM results (running even as we speak) • Repeat experiments on Reuters-2001 • Internal hierarchies • Missing labels • Less correlated types of classes • Results in standard evaluation format

  25. Future Work • Try with a web dataset (Google and Yahoo! Hierarchies) • Hierarchies of more levels • Meta data (for non-text sources)

  26. Related Literature • A study of Approaches to Hypertext, Y. Yang, S. Slattery, R. Ghani, Journal of Intelligent Information Systems, Volume 18, Number 2, March 2002 (to appear). • Learning Mappings between Data Schemas , A. Doan, P. Domingos, and A. Levy. Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000, Austin, TX.

  27. Questions and Suggestions The end.

  28. Taxonomies • Formal systems of orderly classification of knowledge, which are designed for a specific purpose • Change of purpose, change of taxonomies • Businesses often need and keep the information in several structures • Important to be able to automatically map between taxonomies

  29. Useful Mappings • Companies, organizing information in various ways (eg. one for marketing, another for product development) • Personal online bookmark classification • Search engines (eg. Google <-> Yahoo) • EU Committee for Standardization “detailed overview of the existing taxonomies officially used in the EU, in order to derive general concepts such as: information organisation, properties, multilinguality, keywords, etc. and, last but not least, the mapping between.”

More Related