1 / 23

Evaluating Similarity Measures for Emergent Semantics of Social Tagging

Evaluating Similarity Measures for Emergent Semantics of Social Tagging . Authors : Benjamin Markines , Ciro Cattuto , Filippo Menczer , Dominik Benz, Andreas Hotho , Gerd Stumme. Presenter : Zhi Qiao. 1.Introduction 2.Similarity measuring Framework a.Representation

kathie
Télécharger la présentation

Evaluating Similarity Measures for Emergent Semantics of Social Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating SimilarityMeasures for Emergent Semantics of Social Tagging Authors : Benjamin Markines, CiroCattuto, FilippoMenczer, Dominik Benz, Andreas Hotho, GerdStumme Presenter : ZhiQiao

  2. 1.Introduction • 2.Similarity measuring Framework • a.Representation • b.aggregation method • c.similarity measures • 3.Evaluation • a.predicting tag relations • b.evaluation via external grounding • 4. Conclusion and Scalability

  3. Web1.0:users could only view webpages but not contribute to the content of the webpagesFrom web 1.0 to web 2.0 in which allow users to interact and collaborate with each other. e.g. collectively classify and find information like Tagging

  4. Folksonomy Social bookmarking system and their emergent information structures---user create or share tags to annotate resource Example from delicious.com

  5. PageRank : textual analysis of content by taking into account the hyperlinks created by authors as implicit endorsements between pages. Folksonomies: grant us access to a more semantically richer source of social annotation. They allow us to extend the assessment of what a page is about from content analysis algorithms to the collective “wisdom of the crowd”. If many people agree that a page is about programming then with high probability it is about programming even if its content does not include the word “programming”. “The wisdom of the crowd”

  6. Since tags can be easily created by users require no special knowledge. • Lack of structure • Lack of global coherence • Ambiguous • The use of different languages • Spelling mistakes What are the potential problems for tagging?

  7. Should a relationship be stronger if many people agree that two objects are related than if only few people do? • Which weighting schemes regulate best the influence of an individual? • How does the sparsity of annotations affect the accuracy of these measures? • Are the same measures most effective for both resource and tag similarity? • Which aggregation schemes retain the most reliable semantic information? • Which lend themselves to incremental computation? Some open questions to address

  8. Triple Annotation Representation • A folksonomy F is a set of tripes (u,r,t) • User u annotating resource r with tag t Similarity framework

  9. A post (u,r,(t1,…tn)) is transformed into a set of triples{(u,r,t1)…(u,r,tn)} Define similarity measures σ(x,y) where x and y can be two resources or tags

  10. For evaluation purpose, we focus on resource-resource and tag-tag similarity. Therefore we need to aggregate across users. Aggregation Methods

  11. Projection: The simplest aggregation approach is to project across users, obtaining a unique set of (r,t) pairs. Check if the triples are stored in a database relation F. A matrix with binary elements W(0 or 1) to represent the aggregation result. a 0 in the corresponding matrix element means that no user associated that resource with that tag, whereas a 1 means that at least one user has performed the indicated association.

  12. Distributional: A more sophisticated form of aggregation stems from considering distributional information associated with the set membership relationships between resources and tags. One way to achieve distributional aggregation is to make set membership fuzzy (weighted by the Shannon information (log-odds) extracted from the annotations. Intuitively, a shared tag may signal a weak association if it is very common. Thus we will use the information of a tag x defined as -log p(x) where p(x) is the fraction of resources annotated with x. Another approach is to count the users who agree on a certain resource-tag annotation while projecting across users.

  13. Macro-Aggregation Treats each user’s annotation set independently first then aggregates across users. The per-user binary matrix representation w are used to compute a “local” similarity σu(x,y). Finally we macro-aggregate by summing across users to obtain the “global” similarity

  14. Collaborative: • We have only considered feature-based representations. • In collaborative filtering, one or more user annotate two objects is seen as evidence of association. • Adding special tag tuto all resources annotated by u (the probability of observing this tag to any of u’s resources is 1, therefore share no information value according to shannon’s information (-log p(x) ) )

  15. Forprojection: Similarity Measures: Matching

  16. Ex:P(cnn)=1/3(only attach ”news” out of 3 tags) Fordistributional: Similarity Measures:Matching

  17. Compute similarity for individual user then aggregate across users. • Difference: Adding user tag t* • P(cnn|alice)=1/3 ->P(cnn|alice)=1/4 • Collaborative filtering indicates higher similarity than macro aggregation Formacro and collaborative aggregation: Similarity Measures:Matching

  18. Overlap • Jaccard • Dice • Cosine • Mutual Information Similarity Measures: ……

  19. Predicting Tag Relations • BibSonomy.org allows users to input directed relations such as tagging->web2.0 between pairs of tags. • Contain many tags,user data is very sparse • Losing information, sensitive to small change • Miss hierarchical Evaluation

  20. Only looks at rank of similarity not actual similarity value • Use WordNet and Open Directory Project as external grounding • A higher correlation of ranking is to be interpreted as a better agreement with the grounding and thus as evidence of a better similarity measure. Evaluation via External Grounding

  21. Mutual information is the best measure that extracts semantic similarity information from a Folksonomy • Macro-aggregation is less effective than micro-aggregation(proj and ditrib) (Why?) • In spite of macro-aggregation’s shortcomings, collaborative filtering extracts much useful information • Mutual information is the most expensive one • Macro and collaborative aggregation allow for incremental computation because each user’s representation is maintained separately.Can be scalable DISCUSSION AND SCALABILITY

More Related