120 likes | 234 Vues
This document explores the architecture and functionality of the SearchPoint prototype developed by Boštjan Pajntar and Marko Grobelnik at the Jožef Stefan Institute. It delves into user interactions, query processing, and the ranking of search results. The paper addresses the business models surrounding corporate search engines, integration challenges, and funding opportunities. Additionally, it examines techniques for query disambiguation, topic profiling, and the use of ontologies and clustering algorithms to enhance search relevance and effectiveness.
E N D
Working prototype ready for TT http://searchpoint.ijs.si Boštjan Pajntar, Marko Grobelnik Jožef Stefan Institute
The user Architecture of Search “Cookie” • Theuserinputs a precisequery TheUsual Search! • Search engine provides a list of results • Results are returned to theuser in a ranked list
The user Architecture of SearchPoint “Cookie” • The user is presented with hits and topics • Search engine provides a list of results • Results are processed by SearchPoint web service
Where does SearchPoint help? • Internet Search NOT REALY ! • Specific Search Engines • Interest from a company producing corporate search engines: recommind.com • Integrating into intranet search engine over documents Accenture (big consulting company) • Talks with image selling company photo12.com
Open questions • Bussines model • Licensing - How to do it? • Another model? • Prices? • How to run a company • Involve a company to sell/license the product? • Find an executive partner? • Capitalization • Slow growth? • Venture capital?
Thank You! Questions?
Scenarios of usage • Disambiguation of the query: • Jaguar, Cookie, Amazon, A4, … • Sub-topic profiling: • Password (recovery, protection, generator) • Existingontologies, taxonomiesprovide different context for the same data • Study of internet presence of a topic: • Cookies (More recepies than internet cookies)
Ranking Space • SearchPoint visualizes several “nodes”; each relevant to some hits • Nodes are used to createrankingspace • The position of the red focus point determines the ranking
Topics and Concepts • Nodes can come from different sources: • Clustering • Ontology • Simultaneous sources WORK IN PROGRESS!
K-Means Clustering • Twohundred hits (title & snippet) are documents • Topics are the twelve clusters Provided by: Wikipedia
Dmoz Classifier • On the input we take DMoz RDF taxonomy data • We build a classification model consisting from models for individual categories • On the output we get: • Set of most relevant categories from DMoz • Set of most relevant keywords calculated from DMoz category
Search Engines • Any search engine that returns textual results can be consumed by SearchPoint • Web Search Engines: • Google, Yahoo, Microsoft Live Search, … • ProfiledWeb searches: • New York Times, Watson, … • Corporate searches: • Accenture