Usage patterns of collaborative tagging systems

Usage patterns of collaborative tagging systems Scott A. Golder and Bernardo A. HubermanInformation Dynamics Lab, HP Labs, Palo Alto, CA, USA Journal of Information Science, 32 (2) 2006, pp. 198–208 Presented by Jun-Ming Chen 9/12/2008

Outline • Introduction • Tagging and taxonomy • Delicious dynamics • Bookmarks • Stable patterns in tag proportions • Conclusions

Introduction • Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. • In this paper we analyze the structure of collaborative tagging systems as well as their dynamic aspects. • We discovered regularities in • user activity, (Fig. 2, Fig. 3, Fig. 4a, Fig. 4b) • tag frequencies, (Fig. 2, Fig. 3, Fig. 4a, Fig. 4b) • kinds of tags used, (Fig. 5) • bursts of popularity in bookmarking (Fig. 6a, Fig. 6b) and • a remarkable stability in the relative proportions of tags within a given URL (Fig. 7a, Fig. 7b)

Introduction • In this paper we analyze the structure of collaborative tagging systems as well as their dynamic aspects. • We discovered regularities in • user activity, • tag frequencies, • kinds of tags used, • bursts of popularity in bookmarking and • a remarkable stability in the relative proportions of tags within a given URL. • We also present a dynamic model of collaborative taggingthat predicts these stable patterns and relates them to imitation and shared knowledge. • We conclude with a discussion of potential uses of the data that users of these systems collaboratively generate.

Tagging and taxonomy • Taxonomy • 由專家編制的專業層次目錄 • 由一些組織或個人為了自身的需要而編制的，如Yahoo分類目錄等。 • 分類標準是由專家制定的，不一定適合普通用戶使用。 • 獅子、老虎  豹屬 • 家貓  貓屬 • 豹屬、貓屬  貓科

Tagging • A set of words associate with a resource • tags • For managing user’s content collection • organization, share, and retrieve • Flexible, free-style textual description • content information • user comments • individual conceptual associations

blogs music bookmarks movies maps photos MRT Shrek lord of rings NTU opera jazz java classical You Are What You Tag AI rock xml school ocean life travel food cat NightMarket

Tagging and taxonomy

Bookmark it Bookmark it Bookmark it Bookmark it http://www.maxkiesler.com / http:// elzr.com/posts/infodesign 在 web 上，不管 content 的型態多麼豐富當你喜歡它，最直覺的就是把它加入收藏 http://www.vanderwal.net http://www.personalinfocloud.com

A popular bookmarking website • Since 2003 • A large community of active users • Both static and dynamic properties • Collaborative tagging

Delicious dynamics • The data from Delicious to uncover patterns among users, tags and URLs. • Weanalyze tags,bookmarks and URLs • The morning of Friday, June 23 ~ the morning of Monday June 27, 2005. • Retrieving public RSS feeds and then crawling a portion of the website

Delicious dynamics • Two sets of Delicious data • popular set • A total of 212 URLs and 19,422 bookmarks comprise this dataset • people set • Random sample of 229 users; a total of 68,668 bookmarks comprise this dataset. • First, we look at users’ activity with respect to their tag use. Next, we examine tags themselves in greater detail.

-User activity and tag quantity • A weak relationship between the age of the user’s account (i.e. the time since they created the account) and the number of days on which they created at least one bookmark (n = 229; R2 = 0.52). • It’s not a strong relationship between the number of bookmarks a user has created and the number of tags they used in those bookmarks (n = 229; R2 = 0.33).

-User activity and tag quantity • Two of those tags’ usages grow steadily, reflecting continual interests tagged in a consistent way. • One tag grows rapidly, reflecting a newfound interest or a change in tagging practice. It is possible that the newly growing tag represents a new interest or category to the user. • Another possibility is that the user has chosen to draw a new distinction among their bookmarks, which can prove problematic for the user. Because sensemaking is a retrospective process, information must be observed before one can establish its meaning 2200

-Kinds of tags • Here, we identify several functions tags perform for bookmarks. • Identifying What (or Who) it is About (主題) • java, SteveJobs, google, ... • Identifying What it Is (類型) • blog, wiki, tutorial, ... • Identifying Who Owns It (bookmark content) (書籤建置者 ) • O_o, janetyc, fromxxx, ... • Refining Categories (bookmark content) (修飾類別。即本身無意義，而是用來修飾其它標籤） • 25, 2008, 100, ...

-Kinds of tags • Identifying Qualities or Characteristics (主觀感受) • cool, funny, stupid, ... • Self Reference (個人色彩，如 my 開頭的標籤) • my, mycomment, ... • Task Organizing (任務) • todo, toread, ... • Example: Looking for a job !!! • Amazon website : Keyword : Job search  bookstore • 關鍵字是功能性，會隨任務而改變。 • 又有些人喜歡在關鍵字前加上“my”，如 my_cat、my_dog，如此一來，就可以過濾掉其它非“my”的主題，查詢起來會較方便。

-Kinds of tags • We suggest, therefore, that the earlier tags in a bookmark represent basic levels, because they are not only widespread in agreement, but are also the first terms that users thought of when tagging the URLs in question. Highest median rank (greatest frequency) basic level

Bookmarks- bursts of popularity in bookmarking • Here we look at how URLs are bookmarked over time, and at how the sets of tags in a URL’s bookmarks constitute a stable way of describing a URL’s content. URL #1310 URL #1209 rediscovered Delicious is a bookmarking service, a mention on a widely read weblog or website is a plausible primary cause. (1) ‘burstiness’ among links in weblogs (2) literature on opinion (3) fad formation demonstrate how ‘well-connected’ individuals and ‘fashion leaders’ can spread information and influence others

-Stable patterns in tag proportions • The combined tags of many users’ bookmarks give rise to a stable pattern in which the proportions of each tag are nearly fixed. • After the first 100 or so bookmarks, each tag’s frequency is a nearly fixed proportion of the total frequency of all tags used. • This stability has important implications for the collective usefulness of individual tagging behavior.

-Stable patterns in tag proportions • This stable pattern can be explained by resorting to the dynamics of a stochastic urn model originally proposed by Eggenberger and Polya to model disease contagion [22]. • Two reasons why this stabilization might occur are imitation and shared knowledge. • Imitation • Delicious users may imitate the tag selection of other users, • especially if a user does not know how to categorize a particular URL. • It does not explain everything, especially for a few of the most commonly used tags, but the stable pattern persists even for less common tags [22] F. Eggenberger and G. Polya, Über die Statistik verkettete Vorgänge, Zeitschrift für Angewandte Mathematik und Mechanik 1 (1923) 279–89.

Conclusion • We have observed that collaborative tagging users exhibit a great variety in their sets of tags. • Currently this is being performed, problematically, on a small scale by experts and, equally problematically, on a large scale by machines [2, 3]. • The stable, consensus choices that emerge may be used on a large scale to describe and organize how web documents interact with one another. • We expect that these findings will apply to other, similar tagging systems. [2] C. Shirky, Ontology is Overrated: Categories, Links and Tags (2005). Available at: www.shirky.com/writings/ ontology_overrated.html (accessed 10 October 2005). [3] A. Doan, J. Madhavan, P. Domingos and A. Halevy, Learning to map between ontologies on the semantic Web. In: Proceedings of the International World Wide Web Conference (Honolulu, 2002) ((ACM Press, New York, 2002) 662–73.

Comment • 缺乏詳盡性：單詞(Single Word)也不易表達複雜概念 • Tag缺乏字彙控制 (如單複數、字根、縮寫、別名、格式、錯別字等問題) • 精確性：無分類架構，較無脈絡可循 • 無標記原則：標籤格式及給定原則缺乏標準，不易達成一致性

-Cloudalicious • Cloudalicious [21]是一個網路線上的視覺化工具，追蹤其他人如何使用 tag 來標註你網站的服務。 • 用來統計del.icio.us這個網站的軟體，我們根據Cloudalicious的圖表統計，可以明顯的觀察出，網路上的使用者是如何將該URL分類在他們的tag之中 • 看出書籤被給予的 tag 隨著時間也漸漸的改變這改變的原因 • 書籤所連結的網站內容的改變 • 描述書籤所連結的網站內容字詞的改變 • 使用者開始感覺到定義 tag 的社交互動並且開始改變自己的行為 [21] Cloudalicious. Available at: http://cloudalicio.us/ (accessed 10 October 2005).

-Cloudalicious [21] Cloudalicious. Available at: http://cloudalicio.us/ (accessed 10 October 2005).

-Cloudalicious THE RISE OF AJAX

Usage patterns of collaborative tagging systems