Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies

Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies Margaret Maurer Head, Catalog and Metadata Kent State University Libraries and Media Services

In wine making - What is a Varietal? • A wine made from a single, named grape variety. • Cabernet Sauvignon wines are made from cabernet sauvignon grapes • Chardonnay wines are made from chardonnay grapes

In information seeking – on the Web or in the catalog • Access and identification systems may be controlled by librarians–controlled vocabularies • Access and identification systems may be dynamically generated by users–social tagging, folksonomies • These are different varieties of access and identification systems

This presentation • Controlled vocabularies • Social Tagging • Folksonomies • My recommendations First we’ll talk about the cabernet sauvignons – the controlled vocabs

Purpose of a controlled vocabulary • To create sets of objects • To serve as a bridge between the searcher’s language and the author’s language • To provide consistency • To improve precision and recall

Characteristics of a controlled vocabulary • Features a single, authorized form of heading • Often features a syndetic structure of cross-references • Based on belief that the successful use of the catalog is based on the quality of the individual records

The authority record structure • Records the standardized form • Ensures the gathering together of records via that access point • Enables standardized catalog records • Documents decisions taken • Records all other heading forms and provides links from them to the standardized form

Benefits of controlled vocabularies • Promotes discovery generally • Promotes discovery when the aboutness of something has nothing to do with words in the resource or its representation • Imaginative literature (Genre headings) • Humanities • Promotes pre-coordinated displays expand access–http://cinema.library.ucla.edu

Benefits when combined with keyword searching • Keywords hook into strings of terms most efficiently • Users can be routed by pre-coordinated strings

Controlled vocabularies support faceted catalogs • Encore • Evergreen • Endeca • WorldCat Local All provide hyperlinks to authorized headings

Weaknesses of controlled vocabularies • The artificially controlled language is not necessarily natural language—Cookery anyone? • Subject searches are the most problematic for users • It may work better in theory than in practice • It is costly to perform necessary maintenance • Cost is seen to outweigh the benefits by many administrators

Library of Congress Subject Headings - LCSH • Has a long and well-documented history • Commonly used • Is contained in millions of bibliographic records • Strong institutional support from LC

More benefits of LCSH • The rich vocabulary covers most subjects • It imposes synonym and homograph control • There are machine assisted authority control mechanisms • There is pre-coordination with LCC • The music subject heading system is well developed

Weaknesses of LCSH • It is a generalist taxonomy that can’t always provide needed granularity • Terminology currency • It doesn’t allow for post-search coordination (it is pre-coordinated) • It suffers from LC Collection bias

More weaknesses of LCSH • Training needed • Requires some orientation to use effectively • Is not always accurately applied by catalogers • Maintenance • It is difficult to maintain when changes occur

Authority control outside the catalog • Data critical mass  tipping point? • Homogeneity of data in terms of subject matter • Requirements within data community’s users for specificity • Size • Computing power • Wikipedia’s “disambiguation”

ZoomInfo http://www.zoominfo.com/Default.aspx

What if we did open up our authority files to the web? • National Library of Australia’s People Australia Project http://www.nla.gov.au/initiatives/peopleaustralia/ • Wikipedia Persondata-Tool http://www.ifla.org/IV/ifla73/papers/113-Danowski-en.pdf

Is ontology overrated? • Physicality requires ontologies for searching, but systems with hyperlinks do not • Browse versus search may eliminate the need for creating lists of authorized headings

Ontological classification • Works well when the domain to be organized is small, has formal categories, has stable entities, is restricted and has clear edges • Does not work well when the domain to be organized is large, has no formal categories, is unstable, is unrestricted and has no clear edges

Ontological classification • Works well when the participants are expert catalogers, authoritative sources of judgement, coordinated users or expert users • Does not work well when the participants are uncoordinated, armature, naïve or non-authoritative

Now we talk about the Chardonnays – social tagging and folksonomies

What are tags? • Keywords or terms associated with or assigned to a piece of information • They enable keyword-based classification and search of information

Common Web sites that use tags include • Del.icio.us – Social bookmarking site • Flickr – Image tagging • LibraryThing • Gmail - Webmail • YouTube

Tags, and therefore social tags and folksonomies are • Dynamic categorization systems • Often created on-the-fly • Chosen as relevant to the user – not to the creator, cataloger or researcher • A social activity (more on this later) • Hopefully one small step toward a more interactive and responsive library system

Social tags are • Non-hierarchical • A way to create links between items by the creation of sets of objects • A means of connecting with others interested in the same things

Way baaack in 2003… • Del.icio.us includes identity in its social bookmarking • Flickr includes tags • Lists of tags became a tool for serendipitous discovery (folksonomies)

Why is tagging so popular? • It is easy and enjoyable • It has a low cognitive cost • It is quick to do • It provides self and social feedback immediately

People tag things • To find them again • To get exposure and traffic • To voice their opinions • Incidentally as they perform other tasks • To take advantage of functionality built on top of a folksonomy • To play a game or earn points

Putting the social in tagging • Tags allow for social interaction because when we navigate by tags we are directly connecting with others • People tag for their own benefit

Don’t confuse tags with keywords or full-text searching • Keywords are behind the scenes, tags are often visibly aggregated for use and browsing • Keywords can not be hyper-linked • Keywords imply searching, tags imply linking • Full-text searching is passive, tagging is active • It’s more about connecting items rather than categorizing them.

What is a Folksonomy? • Folksonomy refers to an “emergent, grassroots taxonomy” • An aggregate collections of tags • A bottom-up categorical structure development • An emergent thesaurus • A term coined by Thomas Vander Wal

How do folksonomies work? • The searcher defines the access, but • The aggregation of the terms has public value • It’s a typically messy democratic approach

What makes folksonomies popular? • Their dynamic nature works well with dynamic resources • They’re personal • They lower barriers to cooperation

Tagging and the consequent folksonomies work best when • It’s easy to do • It’s not commercial in nature • Taggers have ownership • Taggers are more likely to tag their own stuff than they are your stuff • It has been shown to work well on the Web

The unexpected development: terminological consensus • Collective action yields common terms • Stabilization may be caused by imitation and shared knowledge • The wisdom of the crowd

Is your tagging influenced by my tagging? • Of course it is! • People are beginning tag in ways that make it easier for others to fine like stuff • Shared meaning consequently evolves for tags • Most used tags become most visible

Strengths of folksonomies • Cost-effective way to organize Internet • Social benefits • It’s inclusive • For many environments, they work well

Issues with meaning • They do not yield the level of clarity that controlled vocabularies do • Term ambiguity – words with multiple meanings • No synonym control

Issues with specificity • Variable specificity for related terms • Broadness of terms impacts precision – terms are often imprecise • Mixed perspectives

Issues with structure • Singular and plural forms create redundant headings • No guidelines for the use of compound headings, punctuation, word order • No scope notes • No cross references

Issues with accuracy • Collective ‘wisdom’ of the tagging community • How does wrong information impact retrieval • Conflicting cultural norms • Sometimes authority counts

“Spagging” and other problems • Opening doors to opinion tags • Tagging wars • “Spagging”  Spam tagging

Tidying up the tags…? • Lists of tagging norms have been developed • Are there programmatic solutions? • Users know they are looking at tags • By tidying, do we destroy the essence of why this works? • Do we realistically have the resources?

Recommendations Don’t assume that one size fits all • Retain controlled vocabularies in the catalog • Explore ways to use controlled vocabularies to help organize the internet by re-purposing controlled vocabularies that already exist • Invite Folksonomies to the party in the catalog to gain their benefits • Explore ways to combine the two systems

Recommendations When you invite folksonomies into the catalog, do so strategically, and carefully • Don’t put terms in the same index as controlled vocabularies • Find ways to associate terms applied across editions of works • Need for mediation, or at least observation • The crowd is not necessarily the best arbiter of specific terminology

Recommendations Always remember why people tag • People tag things because they want to find them, not because they want others to find them • Be aware that this will impact the quality of the terms, and their frequency

Recommendations Controlled vocabularies could be better utilized than they currently are • Subject structures are underutilized in the ILS • Controlled vocabularies that exist are not being exported to the Web • Well-connected terms foster discovery – let’s connect them. Index those cross references where available

Questions? Margaret Maurer mbmaurer@kent.edu

Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies