230 likes | 670 Vues
Automated Tag Clustering: Improving Search and Exploration in the TagSpace May 2006 Grigory Begelman Technion Israel Institute of Technology Computer Science Dpt gbeg@cs.technion.ac.il Philipp Keller Citrin Informatik GmbH phred@citrin.ch Frank Smadja RawSugar frank@rawsugar.com
E N D
Collaborative Tagging Workshop, WWW2006 Automated Tag Clustering:Improving Search and Exploration in the TagSpaceMay 2006 Grigory BegelmanTechnion IsraelInstitute ofTechnologyComputer Science Dptgbeg@cs.technion.ac.il Philipp KellerCitrin Informatik GmbHphred@citrin.ch Frank SmadjaRawSugarfrank@rawsugar.com
Collaborative Tagging Workshop, WWW2006 Problem 1:Searching the TagSpace How would You tag this? How would You search For it? Tags: Ikura, Uni, Ebi, Sushi, Nigiri, Japanese food, lunch in Tokyo, Ezobafun-uni, Kitamurashiuni, Murasakiuni, Akazaebi, Tenagaebi, etc.
Collaborative Tagging Workshop, WWW2006 Problem 2: Exploring the TagSpace Locations Restaurant Type morphology Not a restaurant!
Collaborative Tagging Workshop, WWW2006 Problem 3: Exploring the TagSpace Not usable !
Collaborative Tagging Workshop, WWW2006 What is Missing?Tag relations • “Tag Relations improve searchability and exploration.” • Similar tags: • Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged, • Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming, • Tag groups or subtags: • Location -> san francisco, london, new york, etc. • Food -> sushi, sashimi, pizza, etc. • Programming -> html, java, css, etc. Goal : Discover them by Mining the tag space
Collaborative Tagging Workshop, WWW2006 Related Work Tagger’s nightmare!! Top Down Predefined taxonomy Rigid - Not scalable - Expensive
Collaborative Tagging Workshop, WWW2006 Flickr – Clusters
Collaborative Tagging Workshop, WWW2006 RawSugar – Tag HierarchyGuided Navigation Food groups Origins groups Locations groups
Collaborative Tagging Workshop, WWW2006 RawSugar Tag Hierarchy • Key idea: Some users (4%) define tag hierarchies – (food>sushi, european>spanish, …) • We mine this tag space to learn simple tag-relations (ISA relations and RELATED) using probabilities. • At search time: We apply this learned knowledge to group tags from results.
Collaborative Tagging Workshop, WWW2006 RawSugar –Guided Search Combining Hierarchy Fragments User 3 User 1 food europe cooking recipes UK Scotland User4 Edinburgh Spain Asian Chinese Italy Thai User 2 User 5 food Southwest vegetarian California Sushi Bay Area San Francisco Texas
Collaborative Tagging Workshop, WWW2006 Related work Rashmi Sinha: “Tag Sorting: Another tool in an information architect's toolbox” http://www.rashmisinha.com/archives/05_02/tag-sorting.html Emanuele Quintarelli: “Hierarchical taxonomies from flat tag spaces” http://www.infospaces.it/wordpress/topics/information-architecture/91 Paul Heyman (Stanford): “Tag Hierarchies” http://i.stanford.edu/~heymann/taghierarchy.html Brooks, Montanez, University of San Francisco: “Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering ” http://www.cs.usfca.edu/~brooks/papers/brooks-montanez-www06.pdf Siderean fac.etio.us: “Faceted search on delicious tags” http://www.siderean.com/delicious/facetious.jsp Marti Hearst: “Clustering vs. Faceted Search” http://bailando.sims.berkeley.edu/papers/cacm06.pdf And more …
1. Get tag metadata Collaborative Tagging Workshop, WWW2006
2. Build tag relation graph Collaborative Tagging Workshop, WWW2006
3. Compute similarity Collaborative Tagging Workshop, WWW2006
4. Cluster Collaborative Tagging Workshop, WWW2006
Results/Problems: Definition of „internet“ Collaborative Tagging Workshop, WWW2006
Results/Problems: Ambiguity Collaborative Tagging Workshop, WWW2006
Results/Problems: Clustering needs lot of tuning Collaborative Tagging Workshop, WWW2006
Possible application: Group popular bookmarks Collaborative Tagging Workshop, WWW2006
Collaborative Tagging Workshop, WWW2006 Some good Clusters found
Collaborative Tagging Workshop, WWW2006 Tags that belong to the same clusters -