Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Abstract PowerPoint Presentation

Abstract

100 Views Download Presentation
Download Presentation

Abstract

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. WWW blue Inductive Clustering: A technique for clustering search resultsHieu Khac LeDepartment of Computer Science - University of Illinois at Urbana-Champaign Traditional approach Abstract Information overload is a popular problem today. This problem could be solved partially with Search Engine: a tool helps find needed information from the whole web. However, even though some Search Engines work very well, users still cannot avoid information overload problem: there are so many returned results. Post processing search result is a step to further reduce the information overload problem by organizing search results such that minimizing the effort for examining them. This project proposes a novel technique for organizing search results: Inductive Clustering. • IC in detail • Observation: The more specific query we use, the less results we get. • Key idea: From the returned results, generate a summary. Results agree with that summary will be the first cluster. Generate a summary for the remain results; results agree with that summary will be the second cluster. Do the same process until all results are clustered. A large cluster could be clustered more in the same way. Three essential ingredients Need to define a similarity function Need to define a threshold Need to choose the number of clusters in advance and or Those ingredients heavily affect clustering quality. Unfortunately, there is no guidance to tune those things, especially with threshold and number of clusters !!! • Don’t need a threshold or a given number of clusters • Intuitively, results tend to agree with cluster’s summary • It’s easy to continue cluster a large cluster into smaller clusters Introduction Example of an ambiguous query Cluster titles Our approach Inductive Clustering (IC) Experiment Considering first 100 results returned by Google for 30 queries. Observed clusters shows that the algorithm work extremely well. Average Precision with cluster title: 90.5% Average Precision without cluster title: 95.6% Average Precision of cluster’s title: 91.4% Average execution time: 0.27 seconds User’s query Summaries for clusters are generated in advance Conclusion Inductive Clustering is a novel technique to post-process returned search results. The approach does not require manually tuned parameters as previous approaches. The experiments show that IC work extremely well: cluster’s titles are comprehensive, results in each cluster agree with the titles, and execution time is negligible. Results organized with IC are much more easy to captured by users. We envision that IC should be implemented as an online service for broad usage. This project was done under advising of Prof. ChengXiang Zhai *hieule2@uiuc.edu - Date: 05/01/2005* Example of an unambiguous query Clusters with high confidence Sub-queries Summarizing Executing query