1 / 32

Querying Relational Databases without Explicit Joins

Querying Relational Databases without Explicit Joins. Outline. Introduction, Motivation, and Background What is wrong with SQL? How can we replace SQL? Querying by Context using Semantic Names An example query Query Architecture Term dictionary, X-Specs, query processor

Télécharger la présentation

Querying Relational Databases without Explicit Joins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying Relational Databaseswithout Explicit Joins

  2. Outline • Introduction, Motivation, and Background • What is wrong with SQL? • How can we replace SQL? • Querying by Context using Semantic Names • An example query • Query Architecture • Term dictionary, X-Specs, query processor • Query Processor Algorithms • Field/table mapping discovery, join selection • GUI extensions to simplify query construction • Future work and conclusions

  3. Introduction and Motivation • Despite improvements in “core” database technology, advances in database query languages have not kept pace. • SQL is still the fundamental basis for most access languages and tools. • SQL is often difficult to use for beginning users or when formulating queries on large and complex database schema. • Thus, the motivation for designing a high-level query language for users that is also backwards compatible with SQL.

  4. What is wrong with SQL? • There is nothing wrong with SQL. However, SQL is not a simple query language for many reasons: • Querying by structure does not hide complexities introduced due to database normalization. • Structures (fields and tables) may be assigned poor names that do not adequately describe their semantics. • Notion of a “join” is confusing for beginning users especially when multiple joins are present. • SQL forces structural access which does not provide logical query transparency and restricts logical schema evolution.

  5. SQL as a Standard • SQL is an universally-accepted standard supported by all major relational DB vendors. • Object-oriented query languages are also developed similar to SQL. • Most vendors provide graphical query tools that simplify SQL construction. • SQL is “relatively” standardized across database systems and platforms. • Conclusion: Any new query language should be backwards compatible with SQL to guarantee its usefulness. • However, it is desirable to totally hide SQL formulation at the user-level.

  6. Previous Work • There have been several research systems and prototypes in the area. They fall in 3 categories: • 1) Graphical Query Tools and Models • Query by example (QBE), database query tools • 2) Query by Word Phrases • SemQL (WordNet) • Information retrieval techniques, web searching • 3) User-directed Querying • Kaleidoscope (logic)

  7. Query Issues • A desirable query language should: • Allow the user to be able to systematically and deterministically produce results. • Provide a suitable formalism for representing and querying data. • Hide the naming and structures of the database by providing logical and physical access transparency. • Allow the user to browse the contents of the database to determine query concepts. • Provide a graphical query interface. • Be generally applicable to multiple database systems and/or data models.

  8. Querying by Context (QBC) • Querying by context (QBC) is a methodology for querying relational databases by semantics. • Querying is performed by selecting semantic names that represent query concepts. • Semantic names are assigned once by the DBA to describe database semantics. • Users query the database by selecting semantic names from the context view. • The context view contains all concepts present in the database with appropriate semantic names. • A query processor maps the user’s selections and criteria to an actual SQL query.

  9. Querying by Context Example

  10. Querying by Context Example (2) • User query: Retrieve all orders and customer names where an order contains a product from category ‘Produce’. • SQL: SELECT O.OrderID, CU.CompanyName FROM [Categories] AS C, [Customers] AS CU, [Orders] AS O, [Products] AS P, [Order Details] AS OD WHERE C.CategoryName = 'PRODUCE’ AND (C.CategoryID = P.CategoryID) AND (P.ProductID = OD.ProductID) AND (OD.OrderID = O.OrderID) AND (O.CustomerID = CU.CustomerID)

  11. Query by Context Processes • The query architecture consists of three separate processes: • Capture process: independently extracts database schema information and metadata into a XML document called a X-Spec. • Integration process: combines X-Spec(s) into a structurally-neutral hierarchy of database concepts called an integrated context view. • Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) and the results are integrated and formatted.

  12. The Integrated Context View • Before semantic querying can begin, the DBA must assign semanticnames to each field and table in the database. • Semantic names and schema mappings are stored in an XML document called an X-Spec. • Integration of one or more X-Specs produces a structurally-neutral hierarchy of concepts called an integrated context view. • We will only look at querying one database. • A context view (CV) is a valid Universal Relation. • Each field is assigned a semantic name which uniquely identifies its semantic connotation.

  13. Integrated Context View X-Spec Editor Standard Dictionary Integration Algorithm Query Processor and ODBC Manager Query Architecture Client Client • Architecture Components: • 1) Integrated Context View • user’s view of integration • 2) X-Spec Editor • stores schema & metadata • uses XML • 3) Standard Dictionary • terms to express semantics • 4) Integration Algorithm • combines X-Specs into integrated context view • 5) Query Processor • accepts query on view • determines data source mappings and joins • executes queries and formats results Global Query Layer Subtransactions X-Spec X-Spec Database Database Local Transactions

  14. What is a semantic name? • A semantic name is a universal, semantic identifier in a domain. • Similar to a field name in the Universal Relation. • Semantics are guaranteed unique by construction. • Users query based on the semantic names provided by the DBA. • Systematic construction of semantic names allows: • The system to insert join conditions into query. • The context view to be organized hierarchically by semantic concept to reduce burden on user.

  15. User Query Formulation • Users query the context view by: • browsing and selecting semantic names from the context view for selection/projection • specifying ordering and selection criteria • Users do not have to specify joins between concepts: • Some joins are implicit by virtue of hierarchical semantic names. (e.g. [Order;Product] of [Order]). • Joins are automatically inserted when required if not directly specified by user. • User can explicitly direct joins by browsing CV which automatically connects concepts.

  16. Query Processor • Query processor translates from semantic query to SQL which requires: • Mapping from semantic names to field and table names using supplied X-Spec mappings. • One challenge is determining a particular field mapping to use if multiple are present. (e.g. [Order] Id) • Insertion of joins to preserve user query semantics. • If user specifies no relationship between concepts, chose shortest semantic join path, otherwise select shortest physical join path. • If user specifies (some) join semantics, then join determination is simplified. • System does not prevent user from specifying complex, non-standard joins like outer joins.

  17. Selecting Joins Using Join Graphs • Determining joins to use is simplified by defining a join graph. • A join graph is an undirected graph where: • Each node Ni is a table in the database. • There is a link from node Ni to node Nj if there is a join between the two tables. • A join path is a sequence of joins connecting two nodes in the graph. • A join tree is a set of joins connecting two or more nodes. • A join matrix M stores the shortest join paths between any two nodes (tables).

  18. Join Graph for Order Database

  19. Join Discovery Results • Join discovery in a database with a connected, acyclic join graph is simple as there exists only one join tree for any set of tables. • For a cyclic join graph, there may exist more than one join tree for a set of tables and each tree may have different semantics. • Users may eliminate possibilities when browsing the context view or using direct specification. • Otherwise, system selects shortest join path first considering semantic names, and then physical join paths.

  20. Reducing Join Ambiguity • Join ambiguity can be reduced during query formulation without the user’s knowledge. • Example: Retrieve all orders and customer names where an order contains a product in category ‘Produce’. • Semantic names selected: • [Order] Id, [Customer] Name, [Category] Name • no join ambiguity if acyclic graph, however… • Semantic names with no ambiguity: • [Order] Id, [Order;Customer] Name, [Order;Product;Category] Name • Names indicate path from starting context “Order”.

  21. Reducing Join Ambiguity (2) • From a user interface perspective, the second set of semantic names can be determined if: • When the user browses the [Order] context, the [Customer] name information is merged into the [Order] context using a hidden connection on [Order] Id. • Similarly, connect [Product] and [Category] information through [Order] Id (Order to Product) and [Product] Id (Product to Category). • As the user browses the view and uses these connections, this information can be exploited to determine appropriate join paths.

  22. Query by Context Discussion • Important benefits of querying by context: • System table and field names are not presented to the user who queries based on semantic names. • Database structure is not shown to the user. • Field and table mappings are automatically determined based on X-Spec information. • Join conditions are inserted as needed when available to join tables and will preserve user’s query semantics as specified. • Structural neutrality of context view allows QBC to be extended to non-relational databases and be used as a query language for integrated databases.

  23. Conclusions • User querying can be simplified by semantic naming of schema constructs (by the DBA) that hierarchically organizes concepts into a view. • Query by context provides logical query transparency that is suited for databases with schema evolution or integrated systems. • Users are able to transparently query integrated systems by concept instead of structure. • Handling join ambiguity is an important component in mapping to SQL.

  24. Future Work • Continuing to refine a prototype of the system called Unity. • A comparison study of query by context versus traditional SQL, database query tools, and query by example. • The query processor is being extended to resolve more complex queries. • Joins on non-standard keys, joins across databases, etc.

  25. References • Publications: • Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, January 2000. • Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pg 127-136, Oct. 2000. • Using Unity to Semi-Automatically Integrate Relational Schema, Demonstration to appear at ICDE’2002. • Integrating Relational Database Schemas using a Standardized Dictionary, SAC’2001 - ACM Symposium on Applied Computing, March, 2001. • Further Information: • http://www.cs.uiowa.edu/~rlawrenc/

More Related