donar
Uploaded by
14 SLIDES
276 VUES
140LIKES

Enhancing Knowledge Discovery Through Collective Collaborative Tagging Systems

DESCRIPTION

This paper explores the development of a Collective Collaborative Tagging (CCT) System to harness the power of social web content, utilizing collaborative tagging to generate a unified knowledge dataset. With increasing data scattered across blogs, wikis, and social tags, our CCT System incorporates functionality for data importing, querying, and user-friendly interaction, enabling discovery of hidden knowledge. It also examines various information retrieval techniques, trend detection, and community clustering through advanced algorithms including Latent Semantic Indexing and machine learning methods, addressing both the advantages and challenges of collaborative tagging.

1 / 14

Download Presentation
Télécharger la présentation

Enhancing Knowledge Discovery Through Collective Collaborative Tagging Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collective Collaborative Tagging System JongY. Choi, Joshua Rosen, SiddharthMaini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University

  2. People-Powered Knowledge • Social web contents are increasing • Blogs, wikis, ratings, reviews, social tags, … • Help to utilize power of people’s knowledge • Collaborative Tagging • Social bookmarks with metadata annotated • Connotea, Delicious, Flickr, MSI-CIEC • Pros and cons • High-quality classifier by using human annotation • But lack of control or authority

  3. Motivations • Distributed and fragmented knowledge Need an unified data set  More accurate and richer information • No flexibility in choosing different IR algorithms Need a playground to do experiment with various IR techniques Help to discover hidden knowledge • User-friendly Web2.0 style web application

  4. Proposed System Collective Collaborative Tagging (CCT) System CCT System Data Importer RDF RSS Atom HTML Distributed Collaborative Tagging Sites Data Coordinator Populate tags Repository Query with various options User Service Search Result

  5. Service Types and Algorithms Type Service Description Algorithm I Searching Given input tags, returning the most relevant X (X = URLs, tags, or users) LSI, FolkRank, Tag Graph II Recommendation With/without input tags, returning undiscovered X LSI, FolkRank, Tag Graph III Clustering Community discovering. Finding a group or a community with similar interests K-Means, DA clustering, Pairwise DA IV Trend detection Analysis the tagging activities in time-series manner and detect abnormality Time Series Analsysis

  6. Two Models • Vector-space model or bag-of-words model • q-dimension vectordi = (t1, t2, … , tq) • Easy and convenient • Graph model • Building a tag graph • Emphasis on relationship

  7. Vector-space Vs. Graph Vector-space Model Graph Model Represen-tation -. q-dimensional vector -. q-by-n matrix -. G(V, E) -. V = {URL, tags, users} Dis-similarity -. Distances, cosine, … -. O(N2) complexity -. Paths, hops, connectivity, … -. O(N3) complexity Algorithm -. Latent Semantic Indexing -. Dimension reduction schemes -. PCA -. PageRank, FolkRank, … -. Pairwise clustering -. MDS

  8. Latent Semantic Indexing • Dimension reduction from high q to low d(q À d) • Removing noisy terms • Extracting latent concepts • Using SVD to get optimized d-dim. matrix Recall Precision

  9. Pairwise Dissimilarity • Pairwise distance (or dissimilarity) matrix • N-by-N matrix and each element represents a degree of distance or dissimilarity • Used as an input • Multi-Dimensional Scaling (MDS) : to recover dimension information

  10. Pairwise Dissimilarity • Pairwise clustering • Input from vector-based model vs. graph model • How to avoid local minima/maxima? Vector-based model Graph model

  11. Deterministic Annealing Clustering • Deterministically avoid local minima • Tracing global solution by changing level of energy • Analogy to physical annealing process (High  Low)

  12. More Machine Learning Algorithms • Classification • To response more quickly to user’s requests • Training data based on user’s input and answering questions based on the training results • Artificial Neural Network, Support Vector Machine,… • Trend Detection • Time-series analysis

  13. Ongoing Development Different Data Sources Various IR algorithms Flexible Options Result Comparison

  14. Thank you!! Questions? jychoi@cs.indiana.edu

More Related
SlideServe
Audio
Live Player
Audio Wave
Play slide audio to activate visualizer