1 / 41

Database Systems I Relational Algebra

Database Systems I Relational Algebra. Relational Query Languages. Query languages (QL): Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful query languages: Strong formal foundation based on logic.

hadar
Télécharger la présentation

Database Systems I Relational Algebra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Systems I Relational Algebra

  2. Relational Query Languages • Query languages (QL): Allow manipulation and retrieval of data from a database. • Relational model supports simple, powerful query languages: • Strong formal foundation based on logic. • High level, abstract formulation of queries. • Easy to program. • Allows the DBS to do much optimization. • DBS can choose, e.g., most efficient sorting algorithm or the order of basic operations.

  3. Relational Query Languages • Query Languages != programming languages! • QLs not expected to be “Turing complete”. • Able to answer any computable problem given enough storage space and time • QLs not intended to be used for complex calculations. • Applications on top of database can do that • QLs support easy, efficient access to large data sets. • E.g., in a QL cannot • determine whether the number of tuples of a table is even or odd, • create a visualization of the results of a query, • ask the user for additional input.

  4. Formal Query Languages • Two mathematical query languages form the basis for “real” languages (e.g. SQL), and for implementation: • Relational Algebra (RA) • More procedural, very useful for representing execution plans, relatively close to SQL. • Composed of a collection of operators • A step-by-step procedure for computing the answer • Relational Calculus (RC) • Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative.) • Describes the answer, not the steps. • Understanding these formal query languages is important for understanding SQL and query processing.

  5. Preliminaries • A query is applied to relation instances, and the result of a query is also a relation instance. • Inputs and Outputs of Queries are relations • Query evaluated on instances of input relations • Different instance (DB?) as input = different answer • Schemasof input relations for a query are fixed (but query will run regardless of instance!) • The schema for the resultof a given query is also fixed! Determined by definition of input relations and query language constructs.

  6. Preliminaries • Positional vs. named-attribute notation: • Positional notation • Ex: function(1,2,3) • easier for formal definitions • Named-attribute notation • Ex: function(var1=1, var2=2, var3=3) • more readable • Advantages/disadvantages of one over the other?

  7. R1 S1 S2 Example Instances • “Sailors” and “Reserves” relations for our examples. • 2 instances of “Sailors” S1 & S2 • We’ll use positional or named attribute notation • Assume that names of attributes in query results are `inherited’ from names of attributes in query input relations

  8. Relational Algebra

  9. Relational Algebra • An algebra consists of operators and operands. Operands can be either variables or constants. • In the algebra of arithmetic: • atomic operands are variables such as x or y and constants such as 15. • Operators are the usual arithmetic operators such as +, -, *. • Expressions are formed by applying operators to atomic operands or other expressions. • For example, 15 x + 15 (x + 15) * y

  10. Relational Algebra • Algebraic expressions can be re-ordered according to commutativity or associativity laws without changing their resulting value. • E.g., 15 + 20 = 20 + 15 (x * y) * z = x * (y * z) • Parentheses group operators and define precedence of operators, e.g.(x + 15) * y x + (15 *y) • Describes step-by-step procedure for computing the query • Just like algebra in math.

  11. Relational Algebra • In relational algebra, operands are relations / tables, and an expression evaluates to a relation / set of tuples. • The relational algebra operators are • set operations, • operations removing rows (selection) or columns (projection) from a relation, • operations combining two relations into a new one (Cartesian product, join), • a renaming operation, which changes the name of the relation or of its attributes.

  12. Relational Algebra • Every operator takes one or two relation instances • A relational algebra expression is recursively defined to be a relation • A combination of relations is a relation • Result is also a relation • Can apply operator to • Relation from database • Relation as a result of another operator

  13. Relational Algebra Operations • Basic operations • Selection ( ) Selects a subset of rows from relation. • Projection ( ) Deletes unwanted columns from relation. • Cartesian product( ) Combinetwo relations. • Set-difference ( ) Tuples in relation 1, but not in relation 2. • Union( ) Tuples in relation 1 or in relation 2.

  14. Relational Algebra Operations • Additional operations: • Intersection, join, division. • Renaming of relations / attributes. • Not essential, can be implemented using the five basic operations. • But (very!) useful. • Since each operation returns a relation, operationscan be composed, i.e. output of one operation can be input of the next operation.

  15. Renames relations / attributes, without changing the relation instance. relation R is renamed to S, attributes are renamed A1, . . ., An Rename only some attributes using the positional notation to reference attributes No renaming of attributes, just the relation Renaming

  16. Projection • One input relation. • Deletes attributes that are not in projection list. • Schema of result contains exactly the attributes in the projection list, with the same names that they had in the (only) input relation. • Projection operation has to eliminate duplicates, since relations are sets. • Duplicate elimination is expensive. • Commercial DBMS typically don’t do duplicate elimination unless the user explicitly asks for it.

  17. S2 Projection • Similar in concept to VIEWs • Other fields are projected out

  18. Selection • One input relation. • Selects all tuples that satisfy selection condition. • No duplicates in result! • Schema of result identical to schema of (only) input relation. • Selection conditions: • simple conditions comparing attribute values (variables) and / or constants or • complex conditions that combine simple conditions using logical connectives AND and OR.

  19. S2 Selection

  20. S2 Selection • But the result of an expression is another relation, we can substitute an expression wherever a relation is expected

  21. Union, Intersection, Set-Difference • All of these set operations take two input relations, which must be union-compatible: • Same sets of attributes. • Corresponding attributes have same type. • What is the schema of result?

  22. Union S2 S1 • Concatenates S1 and S2 • Result contains ALL tuples that occur in either S1 or S2 • Schemas must be identical • If they have the same number of fields • Fields have same domains SELECT * FROM S1 UNION SELECT * FROM S2

  23. Intersection S2 S1 • Result contains ALL tuples that occur in both S1 or S2 • Schemas must be identical SELECT * FROM S1 INTERSECT SELECT * FROM S2

  24. Set-Difference S2 S1 • Result contains ALL tuples that occur in S1 but not in S2 • Schemas must be identical SELECT * FROM S1 MINUS SELECT * FROM S2

  25. Cartesian Product • Also referred to as cross-product or product. • Two input relations. • Each tuple of the one relation is paired with each tuple of the other relation. • Result schema has one attribute per attribute of both input relations, with attribute names `inherited’ if possible. • In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid. • What to do??

  26. Cartesian Product S2 S1 • Result contains all fields of S1 followed by fields of S2 • Contains one tuple <s1,s2> for each pair of tuples in S1 and S2. • Schemas can be different • If S1 and S2 contains fields of the same name, there is a naming conflict. Rename one or both fields. SELECT * FROM S1, S2

  27. Cartesian Product • Field names in conflict become unnamed R1 S1

  28. Renaming • In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid. • Then, apply the renaming operation, e.g. • The field names in the output relation are the same fields, but some are renamed • Field 1 is renamed to sid1, field 5 is renamed to sid5 • For an expression to be well defined, no 2+ fields must have same name

  29. Join • Similar to Cartesian product with same result schema. • Cartesian Product has no conditions • Join has conditions • Join can be defined as a Cartesian Product, followed by selections and projections • Join however is much more efficient • Why? S1 R1

  30. Join • Condition Join • Each tuple of the one relation is paired with each tuple of the other relation if the two tuples satisfy the join condition. • Condition c refers to attributes of both R and S.

  31. Equi-Join: A special case of Conditional Join where the condition c contains only equalities. Result schema similar to Cartesian product, but only one copy of attributes for which equality is specified. R1.sid is redundant R1.sid is dropped by a applying a projection Natural Join: Equi-join on all common attributes. Join

  32. Division • Not supported as a primitive operation, but useful for expressing queries like: Find sailors who have reserved all boats. • Let A have 2 attributes, x and y; B have only attribute y: • A/B = • i.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple (reservation) in A. • In general, x and y can be any lists of attributes; y is the list of attributes in B, and x y is the list of attributes of A.

  33. Division B1 B2 B3 A/B1 A/B2 A/B3 A

  34. all disqualified x values Division • Division is not an essential operation; can be implemented using the five basic operations. • Also true of joins, but joins are so common that systems implement joins specially. • Idea:For A/B, compute all x values in A that are not ‘disqualified’ by some y value in B. • x value in A is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A. • Disqualified x values: A/B:

  35. Division • Division is not an essential operation; can be implemented using the five basic operations. SELECT name FROM Student WHERE NOT EXISTS ( SELECT CID FROM Course WHERE CID NOT IN ( SELECT CID FROM Register WHERE Student.SID = Register.SID)) If such courses don’t exist (NOT EXISTS), then student must’ve taken all courses returns those courses this student didn’t take

  36. Example Queries • Find names of sailors who’ve reservedboat #103. • Solution 1: • Solution 2: • Solution 3: • Which is most efficient? Why?

  37. Example Queries • Find names of sailors who’ve reserved a red boat. • Information about boat color only available in Boats; so need an extra join: • A more efficient solution: • A query optimizer can find the second solution given the first one.

  38. Example Queries • Find sailors who’ve reserved a red or a green boat. • Can identify all red or green boats, then find sailors who’ve reserved one of these boats: • Can also define Tempboats using union! (How?) • What happens if OR is replaced by AND in this query?

  39. Example Queries • Find sailors who’ve reserved a red and a green boat. • Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):

  40. Example Queries • Find the names of sailors who’ve reserved all boats. • Uses division; schemas of the input relations must be carefully chosen: • To find sailors who’ve reserved all ‘Interlake’ boats: • Book has lots of examples.

  41. Summary • The relational model has formal query languages that are easy to use and allow efficient optimization by the DBS. • Relational algebra (RA) is more procedural; used as internal representation for SQL query evaluation plans. • Five basic RA operations: selection, projection, Cartesian product, union, set-difference. • Additional operations as shorthand for important cases: intersection, join, division. • These operations can be implemented using the basic operations.

More Related