1 / 19

CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud

CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2. Overall goal. Break down search into essential (necessary) components Identify issues associated with each component Facilitate matching of use-cases with functional overview

talen
Télécharger la présentation

CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2

  2. Overall goal • Break down search into essential (necessary) components • Identify issues associated with each component • Facilitate matching of use-cases with functional overview • For a given use-case, identify “critical” components • Those for which there is no known solution • Those for which existing solutions are not performing • Identify use-cases where the model breaks • Repair/extend model • Identify potential « new models » • ----> Prepare Gap Analysis

  3. This analysis tries to be « Media » independant • Functions are media independant • Document discovery • Meta-data extraction • User Interface • ..... • Techniques necessary to implement each function are media dependant ... • Text extraction • Speech to text • Image signatures • .... • ... and are at varying levels of maturity and performance

  4. Top level vision • Search engines come into play when « direct » search into the document repository fails (volume, performance, ...)‏ Documents Querying Indexing Matching Data-base

  5. At the core: matching • Matching happens between two « computer based » chunks of data • Query-meta-data, derived from the user input (and his context)‏ • Document-meta-data derived from the documents being searched Query-meta-data Matching Document-meta-data Data-base

  6. The Matching process • Simple or boolean • AND, OR, NEAR, Parentheses, Regular expression, ... • Accurate of fuzzy • Spelling, phonetic, « similar to », ... • Typed • Author:xx, Title:xx, ... • Centralized/distributed • Across single LAN, across WAN, peer 2 peer, ... • Issues • New media types: algorythms • Performance • single query response time • query throughput

  7. The document side • The main issue: the « Transform »  step • Extracting useful information from the documents Pull Crawl Document Push Transform Matching D-meta-data Build Data-base Content

  8. The document side • Document discovery • Pull=crawling, push=OK • Completeness, freshness, • Building the SE data-base • Scalabality, reliability • Incremental • Distributed • Transform: elaborating D-meta-data • Deal with existing meta-data, multi pass process, ... • Dealing with multiplicity of content type and formats • For each type, specific meta-data elaboration process • Issue • Algorythm (for each media type)‏ • Performance (relates to document repository size and churn rate)‏

  9. The user side • The two main issues • Transforming the user query into Q-meta-data • Organizing the results into manageable form Query UI Transform Q-meta-data User Navigation Matching UI Organize Results Data-base

  10. The user side • Capturing the « user intent » • The DWIM dream • Providing useful hints (what is « searchable »?)‏ • Organizing the results • Assume multiple results, i.e. choice or refinement • Issues • Algorythm (for each media type)‏ • Clustering, structuring, summarizing, ... • User Interface (for each terminal type)‏ • Performance (under the ½ sec threshold)‏

  11. The big picture Pull Push UI Query Transform Crawl Librarian Q-meta-data Document Navigation User Transform Matching D-meta-data Build Organize UI Results Data-base Content Intra-doc navigation

  12. The big picture issues • On the document side, acquiring D-meta-data that will speed up the matching process • Performnce trade-off • On the document side, acquiring D-meta-data that will be relevant on the user side • That will fit « naturally » with the potential user queries • That will assist in organizing results into « manageable » form

  13. Context, personalization User context Content context Pull Push UI Query Transform Crawl Librarian Q-meta-data Document Navigation User Transform Matching D-meta-data Build Organize UI Results Data-base Content Intra-doc navigation

  14. A Functional breakdown of Search Engine (it is much more complex)‏ Usercontext Contentcontext Pull Corpora Push UI Query Transform Crawl Librarian Q-meta-data Document Navigation User Transform Matching D-meta-data Build Organize UI Results Data-base Content Intra-doc navigation

  15. Search vs Alerts Stored queries User context Content context Pull Push UI Query Transform Crawl Librarian Q-meta-data Document Navigation User Transform Matching D-meta-data Build Organize UI Results Data-base Content Intra-doc navigation

  16. Acting on results Stored queries User context Content context Pull Push UI Query Transform Crawl Librarian Q-meta-data Document Navigation User Transform Matching D-meta-data Build Organize UI Results Data-base Content Intra-doc navigation Act User as a “librarian”

  17. Some global cross-functional issues • IP, access rights, usage rights, • Security, privacy, … • Business model • Architecture, APIs, standards, … • Software engineering • Scalability

  18. The Research triangle for Search Engines Usercontext Contentcontext Pull Push UI Query Transform Crawl Librarian Q-meta-data Document Navigation User Transform Matching D-meta-data Build Organize UI Results Data-base Content Intra-doc navigation

  19. Next steps • Quantify limits associated with each functional component • Main driving parameter (size/churn, user population, media type, ...)‏ • Influence on other functional components--> Identify main use-case typology terms • Compare/describe research and industry use-cases according to the proposed functional description • Prepare for gap analysis • Identify expected functional level progress • Identify « mismatch » cases, alternative/complementary models

More Related