120 likes | 240 Vues
Grouper is an interface that dynamically groups search results into clusters using the Suffix Tree Clustering Algorithm (STC). It enhances browsing efficiency by creating coherent clusters and improves speed by post-retrieval clustering. The algorithmic speed and snippet-tolerance of STC make it superior to pre-retrieval clustering methods. With overlapping clusters and a bag-of-words approach, STC is well-suited for web document clustering even in noisy situations. The user interface focuses on making clusters easy to browse, identifying redundant phases, and comparing coherence through various metrics.
E N D
Grouper: A Dynamic CLUSTERING INTERFACE to WEB SEARCH RESULTS Erdem Sarıgil - 21000089 Oğuz Yılmaz - 21000082
Grouper • Interface to the results of the HuskySearch • Dynamically groups the search results into clustersusing Suffix Tree Clustering Algorithm (STC) • The goal make search engine results easy to browse by clustering them • Grouper receives hit from different engines, and only looks at the top hits from each search engine
Post-retrieval Clustering • Based on the returned document set • Superior results than pre-retrieval clustering • Some key requirements: • Coherent Clusters • Efficiently Browsable • Speed • Algorithmic Speed • Snippet-Tolerance
Suffix Tree Clustering (STC) • Linear time clustering algorithm • STC has three logical steps: • Document cleaning • Identifying base clusters using a suffix tree • Merging these base clusters into clusters • STC has several novel characteristics: • Overlapping clusters • Bag-of-words • Well suited for Web document clustering • Robust in such “noisy” situations
Making the Clusters Easy to Browse Three heuristic to identify redundant phases: • Word Overlap • Sub- and Super- Strings • Most General Phase with Low Coverage
Speeeeed • Quality Search • Time Quality OR Time Quality • the vice president of vice president
Comparison • Number of documents followed • Time Spent • Click Distance