1 / 25

Multi-Aspect Tagging for Collaborative Structuring

Multi-Aspect Tagging for Collaborative Structuring. Katharina Morik Artificial Intelligence Unit, University of Dortmund. Data Organization in the Web 2.0. Organizing large data collections requires semantic annotations Individual views: Users annotate items with arbitrary tags

asasia
Télécharger la présentation

Multi-Aspect Tagging for Collaborative Structuring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Aspect Tagging for Collaborative Structuring Katharina MorikArtificial Intelligence Unit,University of Dortmund

  2. Data Organization in the Web 2.0 • Organizing large data collections requires semantic annotations • Individual views: Users annotate items with arbitrary tags • No common view is demanded (“folksonomies”)

  3. Collaborative Structuring • Tags tend to be chaotic • Users are not supported in creating and maintaining large, complex tag structures Desired: • Multi-aspect tagging • Re-use of structures  LACE

  4. Multi-Aspect Tagging • Hierarchical tagging • Tagging with different aspects • Each tag belongs to exactly one aspect • Tagsbelonging to an aspect are partitioningsubset hierarchies pop bad rock good a b blues metal aggressive d e f How can that be achieved automatically?

  5. Current Clustering Methods do not Suffice… • non-redundant clustering: Do not preserve taggings of users • semi-supervised clustering: Do not produce several alternatives • subspace clustering: Do not exploit given clusterings • distributed clustering, ensemble clustering: Locality not considered (instead: consensus model)

  6. …hence new Requirements • Do preserve taggings of users! • Do produce several alternatives! • Do exploit given clusterings! • Do consider locality instead of global consensus!

  7. Localized Alternative Cluster Ensembles (LACE) LACE Learning Task: • Given a set of objects S, clusterings I  {i : Si Gi}, quality function q(I, O, S) • LACE delivers a set of clusterings O  {i :Si Gi} such that q(I, O, S) is maximized each i in O covers at least S. • Results i are composed of existing clusterings ij on subsets S1,…Sm of the set of items to tag. … n 1 alternative metal pop hip hop a d f e g death metal true metal 12 b c 11

  8. Localized Alternative Cluster Ensembles 1 Overall quality: • Sum up over all clusterings in O. • Each i should use as few different clusterings ij as possible. Quality for a single clustering: For each x in S find x’ in Sj from ij: alternative metal hip hop pop a d f e g death metal true metal 12 b c 11

  9. LACE Algorithm Items are represented by Ids. alternative metal a death metal true metal c a b c 11 f b d e g pop hip hop d f 12

  10. LACE Algorithm Best matching cluster node isselected by f-measure. alternative metal a death metal true metal c a b c 11 f b d e g pop hip hop d f 12

  11. LACE Algorithm Items that are sufficiently similar to items in the best matching clustering are deleted from the query set. alternative metal a death metal true metal f b c d 11 e g alternative metal a pop hip hop d f death metal true metal b c 12 11

  12. LACE Algorithm A new query is posed containing the remaining items. Only tags not yet used are considered. alternative metal a death metal true metal f b c d 11 e g alternative metal a pop hip hop d f death metal true metal b c 12 11

  13. LACE Algorithm The process continues until all items are covered, no additional match is possible or a maximal number of rounds is reached. alternative metal a death metal true metal 1 b c e g 11 alternative metal hip hop pop a d pop f hip hop d f death metal true metal 12 b c 12 11

  14. LACE Algorithm Remaining items are added byclassification. alternative metal a death metal true metal 1 b c 11 alternative metal hip hop pop a d pop f hip hop e g d f death metal true metal 12’ b c 12 11

  15. LACE Algorithm Process starts anew until no more matches are possible or the maximal number of results is reached. alternative metal a death metal true metal b c 11 pop hip hop alternative metal pop hip hop d f death metal true metal 12 1

  16. LACE Algorithm Process starts anew until no more matches are possible or the maximal number of results is reached. alternative metal a death metal true metal b c 11 pop hip hop home alternative metal pop work hip hop d f death metal true metal office plane 12 1 2 … k 3

  17. LACE Algorithm Process starts anew until no more matches are possible or the maximal number of results is reached. alternative metal a death metal true metal b c P2p Network 11 pop hip hop home alternative metal pop work hip hop d f death metal true metal office plane 12 1 2 … k 3

  18. Optimizations • Represent clusters only by a fixed number of points • Calculate the fit of a partial clustering bottom-up

  19. Evaluation • Nemoz = Networked Media Organizer • 39 user created taxonomies on 1830 audio files • Evaluation: Leave one clustering out • Comparison of clusterings: • F-Measure (best match) • correlation of tree distances • abs. distance of tree distances • Best result of 5 returned clusterings is used

  20. Evaluation alternative metal pop hip hop death metal true metal home work office plane

  21. Evaluation alternative metal pop hip hop alternative metal pop hip hop death metal true metal death metal true metal alternative metal pop hip hop death metal true metal alternative metal pop hip hop death metal true metal home work office plane

  22. Evaluation Tag Frequency alternative metal pop hip hop alternative metal pop hip hop death metal true metal death metal true metal alternative metal pop hip hop Tag Rank death metal true metal • Popular tags are more likely to be returned  accuracy • Less popular tags can still be returned  diversity alternative metal pop hip hop death metal true metal home work office plane

  23. Evaluation

  24. Conclusion • Collaborative Tagging in the Web 2.0 • Multi-aspect Tagging • Localized Alternative Cluster Ensembles • Distributed Implementation possible • Emerging Tag Structures: Accuracy + Diversity

  25. References • Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T. YALE: Rapid Prototyping for Complex Data Mining Tasks. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006. • Wurst, M. and Morik, K. Distributed Feature Extraction in a P2P Setting - A Case Study. In Future Generation Computer Systems, Special Issue on Data Mining, 2006. • Wurst, M. and Morik, K. Multi-Agent Learning By Feature Sharing. In Proceedings of the 6th European Symposium on Adaptive Learning Agents and MAS, 2006. • Mierswa, I. and Wurst, M. Efficient Case Based Feature Construction for Heterogeneous Learning Tasks. In Proc. of the European Conference on Machine Learning (ECML), 2005. • Mierswa, I. and Morik, K. Automatic Feature Extraction for Classifying Audio Data. In Machine Learning Journal, Vol. 58, 127--149, 2005. • Wurst, M. and Morik, K. and Mierswa, I. Localized Alternative Cluster Ensembles for Collaborative Structuring. In Proc. of the European Conference on Machine Learning (ECML), 2006.

More Related