1 / 92

Some Thoughts on Tagging

Some Thoughts on Tagging. Marti Hearst UC Berkeley. Outline. What are Tags? Organizing Tags for Navigation Facets and faceted navigation How to (semi)automatically create facet hierarchies What’s up with Tag Clouds?. Social Tagging. Metadata assignment without all the bother

lisbet
Télécharger la présentation

Some Thoughts on Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some Thoughts on Tagging Marti Hearst UC Berkeley

  2. Outline • What are Tags? • Organizing Tags for Navigation • Facets and faceted navigation • How to (semi)automatically create facet hierarchies • What’s up with Tag Clouds?

  3. Social Tagging • Metadata assignment without all the bother • Spontaneous, easy, and tends towards single terms • Usually used in the context of social media

  4. The Tagging Opportunity • At last! Content-oriented metadata in the large! • Attempts at metadata standardization always end up with something like the Dublin Core • author, date, publisher, … yaaawwwwnnn. • I’ve always thought the action was in the subject metadata, and have focused on how to navigate collections given such data.

  5. The Tagging Opportunity • Tags are inherently faceted ! • It is assumed that multiple labels will be assigned to each item • Rather than placing them into a folder • Rather than placing them into a hierarchy • Concepts are assigned from many different content categories • Helps alleviate the metadata wars: • Allows for both splitters and lumpers • Is this a bird or a robin • Doesn’t matter, you can do both! • Allows for differing organizational views • Does NASCAR go under sports or entertainment? • Doesn’t matter, you can do both!

  6. Tagging Problems • Tags aren’t organized • Thorough coverage isn’t controlled for • The haphazard assignments lead to problems with • Synonymy • Homonymy • See how this author attempts to compensate:

  7. Tagging Problems / Opportunities • Some tags are fleeting in meaning or too personal • toread todo • Tags are not “professional” • (I personally don’t think this matters) • Great example from Trant: • "Anecdotal evidence also shows that ‘professional’ cataloguers find the basic description of visual elements surprisingly difficult: a curator exhibited significant discomfort during this description task. When asked what was wrong, he blurted out "everything I know isn't in the picture". Investigating social tagging and folksonomy in the art museum with steve.museum", J. Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop

  8. Investigating social tagging and folksonomy in the art museumwith steve.museum", J. Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop

  9. What about Browsing? • I think tags need some organization • Currently most tags are used as a direct index into items • Click on tag, see items assigned to it, end of story • Co-occurring tags are not shown • Grouping into small hierarchies is not usually done • del.icio.us now has bundles, but navigation isn’t good • IBM’s dogear and RawSugar come the closest • I think the solution is to organize tags into faceted hierarchies and do browsing in the standard way

  10. Faceted Navigation and Flamenco

  11. robin penguin salmon cobra bat otter wolf robin bat penguin otter, seal salmon wolf robin bat salmon wolf cobra otter penguin seal The Problem With Hierarchy • Most things can be classified in more than one way. • Most organizational systems do not handle this well. • Example: Animal Classification Skin Covering otter penguin robin salmon wolf cobra bat Locomotion Diet

  12. The Problem with Hierarchy • Inflexible • Force the user to start with a particular category • What if I don’t know the animal’s diet, but the interface makes me start with that category? • Wasteful • Have to repeat combinations of categories • Makes for extra clicking and extra coding • Difficult to modify • To add a new category type, must duplicate it everywhere or change things everywhere

  13. Locomotion: swim fly run slither Covering: fur scales feathers fur scales feathers fur scales feathers … Diet: fish fish fish fish fish fish fish fish fish rodents rodents rodents rodents rodents rodents rodents rodents rodents insects insects insects insects insects insects insects insects insects otter salmon bat robin wolf The Problem With Hierarchy start

  14. The Idea of Facets • Facets are a way of labeling data • A kind of Metadata (data about data) • Can be thought of as properties of items • Facets vs. Categories • Items are placed INTO a category system • Multiple facet labels are ASSIGNED TO items

  15. The Idea of Facets • Create INDEPENDENT categories (facets) • Each facet has labels (sometimes arranged in a hierarchy) • Assign labels from the facets to every item • Example: recipe collection Ingredient Cooking Method Chicken Stir-fry Bell Pepper Curry Course Cuisine Main Course Thai

  16. Fine Arts Museum Example The Flamenco Interface

  17. Advantages of the Approach • Systematically integrates search results: • reflect the structure of the info architecture • retain the context of previous interactions • Gives users control and flexibility • Over order of metadata use • Over when to navigate vs. when to search • Allows integration with advanced methods • Collaborative filtering, predicting users’ preferences

  18. Advantages of Facets • Can’t end up with empty results sets • (except with keyword search) • Helps avoid feelings of being lost. • Easier to explore the collection. • Helps users infer what kinds of things are in the collection. • Evokes a feeling of “browsing the shelves” • Is preferred over standard search for collection browsing in usability studies. • (Interface must be designed properly)

  19. Related Work:Automated Tag Organization • Some efforts are on tag prediction: • Mishne ’06: • Uses IR techniques to find the closest tagged documents, uses their tags to assign new tags. Measures on how well new tags predicted • Xu et al. ’06: • Use tags that have already been predicted for a document to predict which to show to a new user who is tagging the document • Some efforts on tag organization: • Brooks & Montanez ’06: • Tries to see if tags can predict document clusters, which in my book aren’t really categories • After clustering based on text they try to induce a tag hierarchy by agglomerative clustering the text. Results not described in detail • Begelman et al. ’06: • Use clustering and tag co-occurrence to find associated tags. Not clear what the organizational goal is

  20. RawSugar • A company/website that organizes tags from blogs into facets • They are undergoing a revamp, will move to channels • However, nothing published on this • (presumably, patents filed)

  21. Our Approach: Castanet (Stoica & Hearst, to appear at HLT-NAACL ’07) How to Create Facet Hierarchies?

  22. Example: Recipes (3500 docs)

  23. Castanet Output (shown in Flamenco)

  24. Castanet Output (shown in Flamenco)

  25. Castanet Output (shown in Flamenco)

  26. Example: Biology Journal TitlesCastanet Output (shown in Flamenco)

  27. Build tree Compress tree Select terms Get hypernym paths WordNet Divide into facets Castanet Algorithm • Leverage the structure of WordNet Documents

  28. Select well distributed terms from collection red blue 1. Select Terms Build tree Comp. tree Documents Select terms Get hypernym paths WordNet

  29. Build tree Comp. tree Documents Select terms Get hypernym paths abstraction abstraction property property WordNet visual property visual property color color chromatic color chromatic color red, redness blue, blueness 2. Get Hypernym Path red blue

  30. abstraction abstraction abstraction property property property visual property visual property visual property color color color chromatic color chromatic color chromatic color red, redness blue, blueness red, redness blue, blueness red blue 3. Build Tree Build tree Comp. tree Documents Select terms Get hypernym paths WordNet red blue

  31. color chromatic color red blue green 4. Compress Tree Build tree Comp. tree Documents Select terms Get hypernym paths WordNet color chromatic color red, redness blue, blueness green, greenness red blue green

  32. 4. Compress Tree (cont.) Build tree Comp. tree Documents Select terms Get hypernym paths WordNet color color chromatic color red blue green red blue green

  33. 5. Divide into Facets Divide into facets

  34. 2 paths for same word Sense 2 for word “tuna” organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna Sense 1 for word “tuna” organism, being => plant, flora => vascular plant => succulent => cactus => tuna 2 paths for same sense Disambiguation • Ambiguity in: • Word senses • Paths up the hypernym tree

  35. How to Select the Right Senses and Paths? • First: build core tree • (1) Create paths for words with only one sense • (2) Use Domains • Wordnet has 212 Domains • medicine, mathematics, biology, chemistry, linguistics, soccer, etc. • Automatically scan the collection to see which domains apply • The user selects which of the suggested domains to use or may add own • Paths for terms that match the selected domains are added to the core tree • Then: add remaining terms to the core tree.

  36. Castanet Evaluation Method • Information architects assessed the category systems • For each of 2 systems’ output: • Examined and commented on top-level • Examined and commented on two sub-levels • Also compared to a baseline system • Then comment on overall properties • Meaningful? • Systematic? • Likely to use in your work?

  37. CastaNet Evaluation Results • Results on recipes collection for “Would you use this system in your work?” • # “Yes in some cases” or “yes, definitely”: • Castanet: 29/34 • LDA: 0/18 • Subsumption: 6/16 • Baseline: 25/34 • Average response to questions about quality(4 = “strongly agree”)

  38. Will Castanet Work on Tags? • Class project by Simon King and Jeff Towle, 2004 • 1650 captions captured from mobile phones • “Blocks with Grandpa”, “Weezer” , “A veterans day tour of berkeley in front of south hall.”, “Bad photo”, “Kitchen”, “Jgj ” • Wanted to organize them. • Use the CastaNet wordnet-based facet-hierarchy creation algorithm • by Stoica & Hearst, to appear at HLT-NAACL ’07 • Had to first remove proper names

More Related