1 / 49

CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. Introduction. - We discuss here two mathematical formalisms which can be used as the basis for stating and obeying queries in some specific user-oriented language.  SQL is based on relational algebra.

abra-weber
Télécharger la présentation

CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHAPTER 4RELATIONAL ALGEBRA AND CALCULUS

  2. Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and obeying queries in some specific user-oriented language.  SQL is based on relational algebra. - QBE (not included in the new edition of the text) and Datalog (see chapter 24) are based on relational calculus. - The mathematical formalisms introduce various mathematical operators whose application yields answers to the queries.

  3. PRELIMINARIES - The inputs and outputs of a query are relational instances. - We shall present the mathematical operators as operating on relations (sets of lists). However, the cost of eliminating duplicate rows has also led to the practical usage of “multisets” (also called bags) rather than sets. Therefore we shall also present the necessary mathematical operations as operating on relations which consist of multisets of rows. - We shall always illustrate the mathematical operations through examples. To that effect we shall use a sample database defined by the schema (shown on p. 101) and the instances shown in figures 4.1, 4.2, 4.3. In some cases we shall use larger instances under the same schema. -It will be noted that queries can always be expressed by several different expressions. We shall often look at these various approaches. -We shall give unique identifying numbers to our illustrative queries.

  4. SAMPLE SCHEMAS AND INSTANCES The Schemas: Sailors(sid: integer, sname: string, rating: integer, age: real) Boats(bid: integer, bname: string, color: string) Reserves(sid: integer, bid: integer, day: date) The Instances:

  5. SEMANTICS OF THE SAMPLE RELATIONS • Sailors: Entity set; lists the relevant properties of sailors. • Boats: Entity set; lists the relevant properties of boats. • Reserves: Relationship set: links sailors and boats by describing the boat number and date for which a sailor made a reservation. Example of the declarative sentences for which rows stand: Row 1: “Sailor ’22’ reserved boat number ‘101’ on 10/10/98”. N.B. The declarative sentence is obvious in this case, but this may not always be the case, especially for relations which express complex relationship sets with complicated constraints. For example we might add the constraint that a sailor can only reserve one boat on any given date; or, worse yet that a sailor can only reserve up to two boats on any given date.

  6. Relational Algebra

  7. Selection and Projection condition • Selection Operator: σrating>8 (S2) Retrieves from the current instance of relation named S2 those rows where the value of the attribute ‘rating’ is greater than 8. Applying the above selection operator to the sample instance of S2 shown in figure 4.2 yields the relational instance on figure 4.4 as shown below: • π

  8. Projection Operatorπsname,rating(S2) Retrieves from the current instance of the relation named S2 those columns whose names are ‘sname’ and ‘rating’. Applying the above operator to the sample instance of S2 shown in figure 4.2 yields the relational instance on figure 4.5 as shown below: N. B.:Note that the projection operator can produce duplicate rows in the resulting instance.

  9. Remark Concerning Duplicate Rows • Duplicate rows are not permitted in relational algebra and calculus. • Duplicate rows can occur in SQL, though they may be controlled by explicit keywords.

  10. - Projection Operator (cont’d) Similarly πage(S2) yields the following relational instance

  11. FUNCTIONAL APPLICATION OF SELECTION AND PROJECTION OPERATORS The selection and projection operators can be applied successively as many times as desired in the usual functional denotation as illustrated below. Thus the expression πsname,rating(σrating>8(S2)) yields the following relational instance

  12. SET OPERATIONS The following set operations are also available in relational algebra: • Union* • Intersection* • Set-difference* • Cross-product N.B. (1) The asterisks indicate operations whose operand relations must be union-compatible. Two relation instances are said to be union-compatible if: - they have the same number of fields, - corresponding fields have the same domains. - they have the same semantics. (2) The results of set operations on sets and multisets are different, therefore we shall examine both of these separately.

  13. EXAMPLES OF SET OPERATIONS ON RELATIONS • Union:Given the sample instances S1 and S2 The union of S1 and S2, i.e. S1 ∪S2 is shown below

  14. Given the two sample relational instances We can form: Intersection: S1∩S2 Set-Difference: S1 – S2

  15. Given the two relational samples S1 and S2 We can form the Cross-product S1 R1:

  16. MULTISET OPERATIONS Examples: Given two multisets M1 = {a, a, b, b, c, d, f} M2 = {a, b, b, d, e, e} We can form MU = M1 ∪ M2 = {a, a, a, b, b, b, b, c, d, d, d, f, e, e} MI = M1 ∩ M2 = {a, b, b, d} MD = M1 – M2 = {a, c, f} MC= M1  M2 = {<a, a>, <a, b>, <a, b>, <a, d>, <a, e>, <a, e>, <a, a>, <a, b>, <a, b>, <a, d>, <a, e>, <a, e>, <b, a>, <b, b>, <b, b>, <b, d>, <b, e>,<b, e>, <b, a>, <b, b>, <b, b>, <b, d>, <b, e>,<b, e>, <c, a>, <c, b>, <c, b>, <c, d>, <c, e>, <c, e>, <d, a>, <d, b>, <d, b>, <e, d>, <d, e>, <d, e>, <f, a>, <f, b>, <f, b>, <f, d>, <f, e>, <f, e>}

  17. RENAMING OPERATOR,  Instance being renamed (can be a name or an operation) Purpose: to avoid name conflicts and to permit naming anonymous relational instances (like query answers). Example: the expression (C(1  sid1, 5  sid2), S1  R1) Returns a relational instance that contains the tuples shown in figure 4.11 and has the schema C(sid1: integer, sname: string, rating: integer, age: real, sid2: integer, bid: integer, day: date) N.B. The second operand may be a simple relational instance name, thereby renaming that relational instance. Name of new relation Attributes being renamed (1st and 5th)

  18. SPECIAL RELATIONAL OPERATORS The following operators are peculiar to relations: - Join operators There are several kind of join operators. We only consider three of these here (others will be considered when we discuss null values – see section 5.6.4): - (1) Condition Joins - (2) Equijoins - (3) Natural Joins - Division

  19. JOIN OPERATORS Condition Joins: - Defined as a cross-product followed by a selection: R ⋈cS = σc(R  S) (⋈ is called the bow-tie) where c is the condition. - Example: Given the sample relational instances S1 and R1 The condition join S ⋈S1.sid<R1.sid R1 yields

  20. Equijoin: Special case of the condition join where the join condition consists solely of equalities between two fields in R and S connected by the logical AND operator (∧). Example: Given the two sample relational instances S1 and R1 The operator S1 R.sid=Ssid R1 yields

  21. Natural Join - Special case of equijoin where equalities are implicitly specified on all fields having the same name in R and S. - The condition c is now left out, so that the “bow tie” operator by itself signifies a natural join. - N. B. If the two relations have no attributes in common, the natural join is simply the cross-product.

  22. DIVISION - The division operator is used for queries which involve the ‘all’ qualifier such as “Find the names of sailors who have reserved all boats”. - The division operator is a bit tricky to explain, and perhaps best approached through examples as will be done here.

  23. EXAMPLES OF DIVISION Consider the relational instances A, B1, B2, and B3. The division operation A/Bi yields the three relational instances on the right, which are constructed as follows: search among A for all those rows whose A.pno are the same as all the B.pno, and whose A.sno is the same. The corres- ponding A.sno then belong in the answer relation. Thus, B1 contains only p2, so we find in A rows 2, 6, 7, 8 for which A.pno = p2. Therefore we must include in the answer (A/B1) the A.sno values in rows 2, 6, 7, 8. Next, B2 contains p2, p4. So we locate in A rows 2 and 4 which contain respectively p2 and p4 and which have the same value of A.sno (namely s1); then we locate in A rows 8, 9 which again contain respectively p2 and p4 with the same value of A.sno (namely s4); so we place in the answer s1 and s4. Next, B3 contains p1, p2, p4; so we search in A for three rows with the same sno, containing respectively p1, p2, p4. We find only rows 1, 2, 4 for which A.sno is s1. So s1 goes in the answer. Figure 4.14 Examples Illustrating Division

  24. DIVISION Interpretation of the division operation A/B: - Divide the attributes of A into 2 sets: A1 and A2. - Divide the attributes of B into 2 sets: B2 and B3. - Where the sets A2 and B2 have the same attributes. - For each set of values in B2: - Search in A2 for the sets of rows (having the same A1 values) whose A2 values (taken together) form a set which is the same as the set of B2’s. - For all the set of rows in A which satisfy the above search, pick out their A1 values and put them in the answer.

  25. DIVISION Example: Find the names of sailors who have reserved all boats: (1) A = sid,bid(Reserves). A1 = sid(Reserves) A2 = bid(Reserves) (2) B2 = bid(Boats) B3 is the rest of B. Thus, B2 ={101, 102, 103, 104} (3) Find the rows of A such that their A.sid is the same and their combined A.bid is the set B2. Thus we find A1 = {22} (4) Get the set of A2 corresponding to A1: A2 = {Dustin}

  26. FORMAL DEFINITION OF DIVISION The formal definition of division is as follows: A/B = x(A) - x((x(A)  B) – A)

  27. EXAMPLES OF ALGEBRA QUERIES In the rest of this chapter we shall illustrate queries using the following new instances S3 of sailors, R2 of Reserves and B1 of boats.

  28. QUERY Q1 Given the relational instances: (Q1) Find the names of sailors who have reserved boat 103 sname((σbid=103Reserves) ⋈ Sailors) The answer is thus the following relational instance {<Dustin>, <Lubber>, <Horatio>}

  29. QUERY Q1 (cont’d) Sailors)) There are of course several ways to express Q1 in relational algebra. Here is another: sname(σbid=103(Reserves⋈ Sailors)) Which of these expressions should we use? That is a question of optimization. Indeed, when we describe how to state queries in SQL, we can leave it to the optimizer in the DBMS to select the nest approach.

  30. QUERY Q2 (Q2) Find the names of sailors who have reserved a red boat. sname((σcolor=‘red’Boats) ⋈ Reserves ⋈ Sailors

  31. QUERY Q3 (Q3) Find the colors of boats reserved by Lubber. color((σsname=‘Lubber’Sailors) ⋈ Reserves ⋈ Boats

  32. QUERY Q4 (Q4) Find the names of Sailors who have reserved at least one boat sname(Sailors ⋈ Reserves)

  33. QUERY Q5 (Q5) Find the names of sailors who have reserved a red or a green boat. (Tempboats, (σcolor=‘red’) ∪ (σcolor=‘green’Boats)) sname(Tempboats⋈Reserves⋈Sailors)

  34. QUERY Q6 (Q6) Find the names of Sailors who have reserved a red and a green boat. It seems tempting to use the expression used in Q5, replacing simply  by . However, this won’t work, for such an expression is requesting the names of sailors who have requested a boat that is both red and green! The correct expression is as follows: (Tempred, sid((σcolor=‘red’Boats) ⋈Reserves)) (Tempgreen, sid((σcolor=‘green’Boats) ⋈Reserves)) sname ((Tempred∩Tempgreen) ⋈ Sailors)

  35. QUERY Q7 (Q7) Find the names of sailors who have reserved at least two boats. (Reservations, sid,sname,bid(Sailors ⋈Reserves)) (Reservationpairs(1sid1, 2sname, 3bid1, 4sid2, 5sname, 6bid2), ReservationsReservations) sname1σ(sid1=sid2)(bid1bid2)Reservationpairs)

  36. QUERY 8 (Q8) Find the sids of sailors with age over 20 who have not reserved a red boat. sid(σage>20Sailors) - sid((σcolor=‘red’Boats)⋈Reserves⋈Sailors)

  37. QUERY 9 (Q) Find the names of sailors who have reserved all boats. (Tempsids, (sid,bidReserves)/(bidBoats)) sname(Tempsids⋈Sailors

  38. QUERY Q10 (Q10) Find the names of sailors who have reserved all boats called Interlake. (Tempsids, (sid,bidReserves)/(bid(σbname=‘Interlake’Boats))) sname(Tempsids⋈Sailors)

  39. RELATIONAL CALCULUS

  40. Introduction - Relational algebra is procedural: it specifies the procedure to be followed in order to get the answer to the query. - Relational calculus is declarative: it describes (declares) the answer to the query without specifying how to get it. - Relational calculus strongly resembles First Order Predicate Logic, or simply first order logic. - There are two variants of relational calculus: - Tuple relational calculus (TRC) - Domain relational calculus (DRC)

  41. TUPLE RELATIONAL CALCULUS - Aquery statement in TRC is a set declaration having the form: { P ‌ first-order logic formula} - This is ‌‌to be read as ‘the set of tuple variables, P, for which the specified first order logic formula is true’. - Thus a TRC query is a request (to the DBMS) to produce a set of tuples corresponding to the tuples of the relational answer in SQL. - Example Given the following query: (Q11) Find all sailors with a rating above 7. The TRC statement of this query is {S ‌ S  Sailors  S.rating > 7}.

  42. SYNTAX AND SEMANTICS OF TRC • The syntax and semantics of TRC is that of first-order logic. It is stated quite precisely in the text and there is no need to repeat it here. Instead we shall examine a few query applications.

  43. QUERY Q12 (Q12) Find the names and ages of sailors with a rating above 7. {P ‌ ∃S  Sailors (S.rating > 7  P.name = S.name  P.age = S.age)} Remarks 1. The fact that the tuple variable P occurs with two attributes (using the dot notation) means that solely these two attributes are required in the answer relation. 2. The symbols used are the usual first-order logic symbols: ∀: for all ∃: there exists ⋀ : and ⋁: or ¬ : not ⇒ : implies

  44. QUERIES 1,2,7,9,14 The TRC statements for these queries are pretty well self explanatory, especially with the added English statements of how to read them.

  45. DOMAIN RELATIONAL CALCULUS (1) - The form of a DRC query is as follows: {<X1, X2, … , Xn> ‌ logical DRC formula} signifying that the system must construct (and output) a set of all the tuples which satisfy the stated logical DRC formula in terms of the n attributes X1, X2, … ,Xn. Thus, the answer is a relational instance with attributes X1, X2, … , Xn, these attributes corresponding to those of some of the relations in the database. - Again, the approach used by the system is left unspecified. - The Syntax and the semantics of the DRC are explicitly and precisely described in the text.

  46. DOMAIN RELATIONAL CALCULUS (2) Example: (Q11) Find all sailors with a rating above 7. { < I, N, T, A > ‌‌ <I, N, T, A > ∈ Sailors ⋀ T > 7 } Other queries are illustrated and described in the text with all necessary explanation.

  47. EXPRESSIVE POWER OF ALGEBRA AND CALCULUS( 1) Safety - Certain queries stated in the relational calculus may lead to answers which contain an infinite number of tuples (or at least as many as the system can handle). Example: Consider the TRC query {S ‌ ¬(S ∈ Sailors)}. Since there is a quasi-infinite number of tuples that can be created with the attributes of sailors, the answer is (quasi)-infinite. - A query which yields a (quasi)-infinite answer is said to be unsafe, and, of course, should not be allowed by the system. - It is possible to define a safe formula in TRC (see text, section 4.4).

  48. EXPRESSIVE POWER OF ALGEBRA AND CALCULUS (2) • A query language is said to be relationally complete if it can express all the queries that can be expressed in relational algebra. • SQL is relationally complete. • Every query that can be expressed using a safe relational calculus query can be also be expressed as a relational algebra query. • SQL provides additional expressive power beyond relational algebra.

  49. QUERY Q1

More Related