Processing XML Keyword Search by Constructing Effective Structured Queries

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology, Australia

Outline • Motivation of Keyword Search in XML • Brief Review of Related Work • Existing Problems • Construct Structured Query Templates • Ranking Function • Processing Algorithms • Conclusions

Motivation of XML Keyword Search • Keyword search is easy-to-use • Users don’t need to know the structure of XML data and specific query languages. • The XML data with different structures can be searched equivalently by a keyword query because it doesn’t specify the structures of the retrieved results.

Brief Review of Related Work • We focus on 4 references using label and term as keyword query format: • [YunyaoLi2004VLDB] Schema-Free XQuery. • [DanielaFlorescu2002ComputerNetworks] Integrating keyword search into XML query processing. • [SaraCohen2003VLDB] XSEarch: A semantic search engine for XML. • [WeidongYang2007CIT] Schema-aware keyword search over xml streams. • Other relevant work can be found in our paper.

Brief Review of Related Work • All the four work utilized label and term as keyword query format. • The difference: the first three work shared the similar basic strategy that first retrieves the relevant keyword lists and then merges them into the results; while the last one first generate a big template that covers all the kinds of results w.r.t. XML schema and then cache the possible results over xml streams. Template-based strategy can obtain better performance[WeidongYang2007CIT]!

Existing Problems • [WeidongYang2007CIT] was used to query over XML streams, which is not enough because of the challenges: • Different templates may exist in one XML data repository. • Users prefer to see part of the results, e.g., top k results. • Domain knowledge can be helped to process the labels with the same meaning. • Therefore, it is required to study the problem of applying template-based keyword search strategy to XML data repository.

Construct Structured Query Templates • Example: There are two data sources that conform to t1 and t2 respectively. Schema t1 Schema t2 Keyword query – (year:2006, title:xml, author:philip)

Construct Structured Query Templates • Identifying context of keywords • Determine master entities using labels in keyword query and XML schema. • Generate FOR clause for each entity. • Judge the occurrences of every label under each master entity. • Once a time – Generate WHERE clauses • More than once – First cluster and then generate WHERE clauses.

Step 1: determine master entity and its corresponding label set • Q1 = “For $b in bibliography/books/book” • Q2 = “For $a in bibliography/articles/article” • Step 2: only one occurrence of each label in each master entity. • Q1 += “Where $b/year=‘2006’ and $b/title.contains(xml) and $b/author.contains(philip)” • Q2 += “Where $a/year=‘2006’ and $a/title.contains(xml) and $a/author.contains(philip)” Schema t1 Keyword query – (year:2006, title:xml, author:philip)

Step 1: determine master entity and its corresponding label set • Q = “For $bi in bibliography/bib” • Step 2: only two occurrences of each label in the master entity. Cluster title and author using book and article respectively • Q1 += Q + “For $bo in $bi/book” • Q2 += Q + “For $a in $bi/article” • Step 3: only one occurrence of each label in each cluster. • Q1 += “Where $bi/year=‘2006’ and $bo/title.contains(xml) and $bo/author.contains(philip)” • Q2 … Schema t2 Keyword query – (year:2006, title:xml, author:philip)

Construct Structured Query Templates • Identifying returned nodes • Step1: If the cardinality of a master entity satisfies “*” and no cluster operation is activated, we take the master entity as a return node in constructed queries; • Step 2: If the cardinality of a master entity satisfies “*” and clusters are generated, we first check the root node of each cluster in a recursive procedure (back to step 1); • Step 3: If the cardinality of a master entity does not satisfy “*”, we will probe its ancestor nodes one by one until this kind of node exists or the root of the xml schema.

Schema t1 Schema t2 • Master entities are the returned nodes. • Q1 += “$b” • Q2 += “$a” • Roots of clusters are the returned nodes. • Q1 += “$bo” • Q2 += “$a” The constructed queries can be read in our paper! Keyword query – (year:2006, title:xml, author:philip)

Ranking Function • vm is the master entity nodes; • ω(vi, ti) is calculated by using tf*idf weight model. Feature of the function: The Score() consists of two parts ContextScore() and tf*idf weight, and the former is the upper bound of the score of the results.

Processing Strategy • Algorithm 1 is used to generate structured queries with their corresponding context score. • Algorithm 2 is used to schedule the query plan according to the conditions: • Users’ requirements, e.g., number of results; • Context scores of all generated queries; • And the intermediate results.

Experiments • Dataset: • Sigmod record • three variant of DBLP • Keyword Queries: • q1 (author:David, title:XML) • q2 (year:2002, title:XML)

Experimental Results q1 q2 q2(k = 20) q1(k = 10)

Conclusions • XBridge is proposed to process keyword query over XML data repository, which can efficiently find the top k results by evaluating generated structured queries. • A precise ranking function is provided to evaluate the relevance of the results. • Limitation of this work: • We take XML schema as tree patterns; • We didn’t consider reference relationships of XML data.

Processing XML Keyword Search by Constructing Effective Structured Queries

Processing XML Keyword Search by Constructing Effective Structured Queries

Presentation Transcript

Structured Queries for Legal Search

Keyword Proximity Search on XML Graphs

XRANK: Ranked Keyword Search Over XML Documents

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data

Effective Keyword Search for Valuable LCAs over XML Documents

Effective XML Keyword Search with Relevance Oriented Ranking

XClean: Providing Valid Spelling Suggestions for XML Keyword Queries

Keyword Search On Structured Database

Efficient Keyword Search over Virtual XML Views

Efficient Keyword Search Over Virtual XML Views

Integrating Keyword Search into XML Query Processing

Efficient Keyword Search over Virtual XML Views

Keyword Proximity Search on XML Graphs

Efficient Keyword Search for Smallest LCAs in XML Database

XRANK: Ranked Keyword Search over XML Documents

Suggestion of Promising Result Types for XML Keyword Search

See-To-Retrieve: Efficient Processing of Spatio-Visual Keyword Queries

XML Keyword Search Refinement

Supporting Top-K Keyword Search in XML Databases

Keyword Search and Keyword Selection

Keyword Search on Graph-Structured Data

Effective Keyword Search in Relational Databases