1 / 9

Discovering Query Context using Concept Hierarchy

Discovering Query Context using Concept Hierarchy. Mukesh Mohania IBM Research – India. Discovering Query Context using Concept Hierarchy for Information Integration. Motivation: We keep refining the search query keywords till we get the desired results

marny-morse
Télécharger la présentation

Discovering Query Context using Concept Hierarchy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Query Context using Concept Hierarchy Mukesh Mohania IBM Research – India

  2. Discovering Query Context using Concept HierarchyforInformation Integration Motivation: • We keep refining the search query keywords till we get the desired results • That is, the onus of specifying appropriate set of keywords (called, query context) remains with the user, • limitation since the user might not be aware of the overall context at the point of submitting the query. • Inevitable when the search query is submitted through the mobile device, particularly when the data source is not available all the time for searching or the bandwidth is limited. • The problem is how to derive the full context (as a set of keywords) of a query without sending the search query to the back-end data source. • We propose to store and use the concept hierarchies at mobile device for discovering the query context. • The advantage of discovering the query context is to further get all documents from enterprise content repositories and/or from external web which are highly relevant to the query results.

  3. Proj | Name | Description | Group 1 | CORTEX | Policy Management | 6 Org | Org Name | Address 5 | IRL | Hauz Khas, New Delhi PROJECTS ORG Proj | Emp 1 | 7 1 | 4 Emp | Name | Group 4 | Mukesh | 6 7 | Manish | 6 Group | Name | Manager | Org 6 | Databases | 4 | 5 GROUPS EMPLOYEES PROJEMPS DB2 Report.doc: “… Report on Database Research at IRL … Policy Definition … “ New.txt: “Some other document” Another.pdf: “Expense Reimbursement Policy at IRL” DocID | Author | Year Report.doc | Mukesh | 2000 Another.pdf | Manish | 2002 New.txt | Prasan | 2003 Content Manager Select Name, Description From PROJECTS Where Name = “CORTEX” Query Name | Description CORTEX | Policy Management Report.doc: “… Report on Database Research at IRL … Policy Definition … “ Year < 2003 Directives Report.doc additionally retrieved based on the context. Note: No mention of CORTEX in document Result Motivating Example

  4. CRUX Main Idea • Specify information need in terms of SQL over the structured database • Additional information needs specified using “directives” (optional) • Automatically synthesize the “context” of the SQL query from its result as well as related information present elsewhere in the database (found using the known semantic dependencies in the structured data) • Considers the foreign-key dependencies in either direction (PK  FK, FK  PK) • Automatically determines the most promising subset of the related information to explore • Use this context and the directives to retrieve the relevant unstructured data

  5. FILLED IN BY THE USER Select GROUPS.Name, PROJECTS.Description From PROJECTS, PROJEMPS, EMPLOYEES, GROUPS Where PROJECTS.Proj# = PROJEMPS.Proj# And PROJEMPS.Emp# = EMPLOYEES.Emp# And EMPLOYEES.Grp# = GROUPS.Grp# And PROJECTS.Name = “CORTEX” Select ARTICLES.id From ARTICLES Where ARTICLES.contains(“Report AND SET-OR($GROUPS.Name)”) And ARTICLES.year > 2001 ORDER BY RELEVANCE USING THRESHOLD = 0.8 CORE QUERY SEARCH DIRECTIVES

  6. Core SQL Query Select GROUPS.Name, Projects.Description From PROJECTS, PROJEMPS, EMPLOYEES, GROUPS Where PROJECTS.Proj# = PROJEMPS.Proj# And PROJEMPS.Emp# = EMPLOYEES.Emp# And EMPLOYEES.Grp# = GROUPS.Grp# And PROJECTS.Name = “CORTEX” Restrict to documents containing “Report” and at least one group name Select ARTICLES.id From ARTICLES Where ARTICLES.contains(“Report AND SET- OR($GROUPS.Name)”) And ARTICLES.year > 2001 ORDER BY RELEVANCE USING THRESHOLD = 0.8 Order by relevance to the structured query with cutoff threshold as 0.8 Restrict to documents published after 2001 Search Directives (optional) Core SQL Query and Search Directives

  7. Concept Hierarchies (CH) • CH are defined over relational tables using FK, PK relationship (not all attributes are considered – depending on the application/scenarios) • Predefined hierarchical relationships that map a set of lower-level concepts to their higher level correspondences • Example, {tennis, rugby, hockey, football} can be generalized as “sports” at a high level concept. • CH can be constructed in many ways

  8. Scenarios • Health: Patient specific report and medical articles, • Manufacturing: Defect statistics and engineering specifications, • Marketing: Customer transaction history and marketing documents, • Travel: Traveler itinerary and promotional flyers, travel advisories, • Management: Employee records and status reports

  9. Research Issues • Limitation: Context is a set of keywords. • – Semantics based context? • Limitation: The context of a search query is determined by concept hierarchy. • -- Other avenues such as the previous results retrieved and the query workload, the user profile, if given • Limitation: Context of a search query is mapped to one or more categories • -- Precise context-based document retrieval may be helpful • Limitation: The documents and search query research are returned as unordered sets. • -- For usability reasons, efficient ranking algorithms to order the returned results with respect to the input query context would be needed. • Limitation: Entire query results returned in addition to the documents on response to a query. • -- (HCI issue) The query results need to be presented in a browse-able manner, or may even be presented as smart tags dynamically attached to the documents.

More Related