1 / 16

DBXplorer: A System for Keyword-Based Search over Relational Databases

DBXplorer: A System for Keyword-Based Search over Relational Databases. Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande. Contents. Introduction Overview of DBXplorer Symbol Table Design - Publish Keyword Search Support for Generalized Matches Conclusion.

inez
Télécharger la présentation

DBXplorer: A System for Keyword-Based Search over Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DBXplorer: A System for Keyword-Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande

  2. Contents • Introduction • Overview of DBXplorer • Symbol Table Design - Publish • Keyword Search • Support for Generalized Matches • Conclusion

  3. Introduction • Internet search engines have popularized keyword-based search. • Searching on traditional database management system is done through customized applications which are closely tied to the database schema. • Traditional database management systems do not support keyword-based search. • e.g. search the Microsoft intranet on ‘Jim Gray’ to obtain matched rows, i.e., rows in the database where ‘Jim Gray’ occur. • In this paper, DBXplorer, an efficient and scalable keyword search utility for relational databases, is described. • The goal is to enable such searches without necessarily requiring the users to know the schema of the respective databases.

  4. Example Searching for a book • Keywords “Programming” by “Ritchie” • Less probability of presence of both keywords in single row of table • For result, rows need to be generated by joining tables on the fly (all possible combinations) Authors AuthorsBooks Books BookStores Store

  5. Overview of DBXplorer • Objective - Given a set of query keywords, DBXplorer returns all rows (either from single tables, or by joining tables connected by foreign-key joins) such that the each row contains all keywords. • Applying IR techniques from the documents world to databases is difficult, because of • Database normalization – by which logical units of information may be fragmented and scattered across several tables • Matching row may be obtained by joining several tables on the fly • IR techniques use Inverted Lists = Symbol Table in databases • Symbol Table - Stores the information about keywords at different granularities (column/row), i.e. for each keyword stores the list of all rows

  6. Overview of DBXplorer - Publish • Enabling keyword search in DBXplorer requires 2 steps • Publish • enables database for keywords by building Symbol table and associated structures • Search • retrieves matching rows from published database • Steps in Publish • Step 1: A database is identified, along with the set of tables and columns within the database to be published. • Step 2: Auxiliary tables [like Symbol table] are created for supporting keyword searches.

  7. Overview of DBXplorer - Search • Steps in Search • Looks up symbol table to find tables / columns which contain keywords • Identify all possible joins (subsets of table if joined) whose rows might contain required keywords • Generate SQL statements for each join (gives rows which contain all keywords), rank rows and return

  8. Architecture of DBXplorer • The publish component provides interfaces to • select a database, • select tables/columns within the database to publish, and • modify/remove/maintain the publication. • For a given set of keywords, the search component provides interfaces to • retrieve matching databases from a set of published databases, and • selectively identify tables, columns/rows that need to be searched within each database identified in step (1). Architecture of DBXplorer

  9. Symbol Table Design • Exact match problem considered only • Symbol Table (S) - stores the information about keywords at different granularities (column/row), i.e. for each keyword stores the list of all rows • Column Level granularity (Pub-Col) • For every keyword S maintains list of all database columns (i.e. table.column) • Cell level granularity (Pub-cell) • For every keyword S maintains list of all database cells (i.e. table.column.rowid)

  10. Symbol Table - Design factors • Space and Time Requirements • Size: pub-col are smaller than pub-cell since repetition of keyword in a column does not increases entries in case of pub-col • Time: pub-col takes less time to build • Keyword search performance • Depends on efficient generation and execution of SQL statement (built from symbol table entries) • pub-cell returns more number of SQL statements than pub-col as for a keyword in column there are multiple entries • Ease of maintenance • Insert/Update: required for insertion of distinct new value in case of pub-col while pub-cell needs for every update/insert • Same for delete

  11. Storing Symbol Table • Store symbol tables (pub-col) in database as (keyword hash , column Id) • FK Compression (Foreign Key) • If there is foreign key relationship between c1 and c2, store only c1 • CP Compression • Partition H into a minimum number of bipartite cliques (a bipartite clique is any subgraph of H with a maximal number of edges). • Compress each clique. • Stores symbol table (pub-cell) in database as (keyword hash, list of all cellids) v2 v3 v4 c1 c2 x Uncompressed hash table ColumnsMap table Compressed hash table

  12. Search - Enumerating Join Trees • Step1 - Looks up symbol table to find tables / columns which contain keywords • Step2 - Enumerate join trees • Identify and enumerate all potential subsets of tables in the database that, if joined, might contain rows having all keywords. • The resulting relation will contain all potential rows having all keywords specified in the query. keywords If we view the schema graph G as an undirected graph, this step enumerates join trees, i.e., sub-trees of G such that • the leaves belong to MatchedTables • together, the leaves contain all keywords of the query Join Trees

  13. Search – Identify matching rows • The input to this final search step is the enumeratedjoin trees. • Each join tree is then mapped to a single SQLstatement that joins the tables as specified in the tree, andselects those rows that contain all keywords. • The retrieved rows are ranked before being output. • Rows ranked by number of joins involved (ties broken arbitrarily) (same as keywords occurring close to one another in documents are ranked higher) Join Trees

  14. Generalized Matches – Token Matches • Token matches - the keyword in the querymatches only a token or sub-string of an attribute value (e.g., retrieve rows of address by specifying only a street name). • Pub-Prefix method • B+ tree indexes can be used toretrieve rows whose cell matches a given prefix string • This clause is of the form WHERE T.C LIKE ‘P%K%’ • During publishing of a database, for every keyword K,the entry (hash(K), T.C, P) is kept in the symbol table ifthere exists a string in column T.C which • contains atoken K, and • has prefix P

  15. Generalized Matches - Token Matches Let the hash values of thesearchable tokens i.e., ‘string’, ‘ball’ and ‘round’ be 1, 2and 3 respectively Pub-Prefix table Database table T Consider searching keyword “string” Pub-Prefix table returns prefixes “th” and “no” and subsequent SQL will contain (T.C LIKE ‘th%string%’) OR (T.C LIKE ‘no%string%’)

  16. Conclusion • This paper discusses DBXplorer, a system that enables keyword-based search in relational databases. • DBXplorer uses symbol table alternatives to store the location of keywords in database. • DBXplorer support exact matches and generalized matches upto some extent.

More Related