SQL: Queries, Constraints, Triggers Chapter 5
SQL • Data Manipulation Language (DML):subset of SQL which allows users to create queries and to insert, delete and modifyrows. • Data Definition Language (DDL):subset of SQL which supports the creation, deletion and modifications for table and views. • Triggers and Advanced Integrity Constraints:actions that executed by the DBMSwhenever changes to the database meet conditions specified in the trigger. • Embedded and Dynamic SQL:allow SQLcode to be calledfrom a hostlanguage such as C or COBOL. • Client-Server Execution and Remote Database Access:controls how client application program can connect to an SQL database server, or access data from a database over a network. • Transaction management:commands allow a user to explicitly control aspects ofhow a transaction is tobeexecuted. • Security: Provide mechanisms to controluser’saccess to data objects such as tables and views. • Advanced features: such as object-oriented features, recursivequeries, decisionsupportqueries, datamining, spatialdata, text and XMLdatamanagement
The form of Basic SQL query: • Select [DISTINCT] select-list • From from-list • Where qualification Select should specify columns to be retained in the results From should specify a cross-product of tables Where specify selection conditionson the tablesmentioned in the from clause. Example: Sailor(sid: integer, sname: string, rating: integer, age: real) Boats(bid: integer, bname: string, color: string) Reservation(sid: integer, bid: integer, day:date)
R1 Example Instances S1 • We will use these instances of the Sailors and Reserves relations in our examples. • If the key for the Reserves relation contained only the attributes sid and bid, how would the semantics differ? S2
Basic SQL Query SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification • relation-list A list of relation names (possibly with a range-variable after each name). • target-list A list of attributes of relations in relation-list • qualification Comparisons (Attrop const or Attr1 op Attr2, where op is one of ) combined using AND, OR and NOT. • DISTINCT is an optional keyword indicating that the answer should not contain duplicates. Default is that duplicates are not eliminated!
Basic SQL Query SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification • Find the names and ages of all sailors • Select DISTINCTS.sname, S.age • From Sailor.S • The result without using DISTINCT is:
Conceptual Evaluation Strategy • Semantics of an SQL query defined in terms of the following conceptual evaluation strategy: • Compute the cross-product of relation-list. • Discard resulting tuples if they fail qualifications. • Delete attributes that are not in target-list. • If DISTINCT is specified, eliminate duplicate rows. • This strategy is probably the least efficient way to compute a query! An optimizer will find more efficient strategies to compute the same answers.
Example of Conceptual Evaluation Find the names of sailors who have reserved boat number 103. SELECTS.sname FROM Sailors S, Reserves R WHERES.sid=R.sidAND R.bid=103 Instance R3 of Reserve Instance S4 of Sailors The first step is to construct S4xR3 which shows the following results:
Example of Conceptual Evaluation The second step is to apply the qualification S.sid R.sid AND R.bid=103. This step eliminates all but the last row from the instance shown in the following figure. The third step is to eliminate unwanted columns, so only sname appears in the select clause
A Note on Range Variables • Really needed only if the same attribute (sid) appears twice in the FROM clause. The previous query can also be written as: SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND bid=103 It is good style, however, to use range variables always! OR SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103
Find sailors who’ve reserved at least one boat SELECTS.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid • The join of Sailors and Reserves ensures that for each selected sname, the sailor has made some reservation.
Find the sids of sailors who have reserved a red boat. • This query contains a join of two tables, followed by a selection on the color of boats. SELECTR.sid FROMBoats B, Reserves R WHEREB.bid = R.bid AND B.color = ‘red’
Find the names of sailors who have reserved a red boat. • This query contains a join of three tables, followed by a selection on the color of boats. • The join with Sailors allow us to find the name of the sailor who according to reserves tuple R, has reserved a red boat scribed by tuple B. SELECTS.sname FROMSailors S, Reserves R, Boats B WHERES.sid=R.sid AND B.bid = R.bid AND B.color = ‘red’;
Expressions and Strings SELECTS.age, age1= S.age-5, 2*S.ageAS age2 FROM Sailors S WHERES.snameLIKE ‘B_%B’ • Illustrates use of arithmetic expressions and string pattern matching: Find triples (of ages of sailors and two fields defined by expressions) for sailors whose names begin and end with B and contain at least three characters. • AS and = are two ways to name fields in result. • LIKE is used for string matching. `_’ stands for exactly one character and `%’ stands for 0 or more arbitrary characters. • Thus, ‘-AB%’ denote a pattern marching every string that contains at least three characters with the second and third characters being A and B respectively. • ‘B-%B’ names begins and ends with B and has at least three characters.
Find the names of sailors who’ve reserved a red or a green boat SELECT S.snameFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid AND (B.color=‘red’ ORB.color=‘green’) • The query is easily expressed using the OR connective in the WHERE. However, the following query which is identical except for the use of ‘and’ rather ‘or’ turns out to be much more difficult: • Find the names of sailors who’ve reserved a red and a green boat • Why? • If we replace the use of OR in the previous query by AND, we would retrieve the names of sailors who have reserved a boat that is both red and green. • The integrity constraint that bid is a key for Boats tells us that the same boat cannot have two colors. • So what will be the result of this query? • Empty
Find the names of sailors who’ve reserved a red ANDa green boat A correct statement of the above query using AND will be as follow: SELECTS.sname FROM Sailors S, Reserves R1, Boats B1, Reserves R2, Boats B2, WHERE S.sid=R1.sid ANDR1.bid=B1.bid AND S.sid=R2.sid AND R2.bid=B2.bid AND(B1.color=‘red’ AND B2.color=‘green’) • We can think of R1 and B1 as rows that prove that sailor S.sid has reserved a red boat. R2 and B2 similarly prove that the same sailor has reserved a green boat. • S.sname is not included in the results unless five such rows S, R1, B1, R2, and B2 are found. But, the previous query is difficult to understand, thus, a better solution foe these two queries is to use UNION and INTERSECT.
Find the names of sailors who’ve reserved a red or a green boat SELECTS.sname FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND (B.color=‘red’ ORB.color=‘green’) • UNION: Can be used to compute the union of any two union-compatible sets of tuples (which are themselves the result of SQL queries). • Also available: EXCEPT(What do we get if we replace UNIONby EXCEPT?) SELECTS.sname FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid ANDB.color=‘red’ UNION SELECTS2.sname FROM Sailors S, Boats B2, Reserves R2 WHERES2.sid=R2.sid ANDR2.bid=B2.bid ANDB2.color=‘green’
Find name of sailors who’ve reserved a red and a green boat SELECTS.sname FROM Sailors S, Boats B1, Reserves R1, Boats B2, Reserves R2 WHERE S.sid=R1.sid AND R1.bid=B1.bid AND S.sid=R2.sid AND R2.bid=B2.bid AND (B1.color=‘red’ AND B2.color=‘green’) • INTERSECT: Can be used to compute the intersection of any two union-compatible sets of tuples. • Included in the SQL/92 standard, but some systems don’t support it. • Now try the same query but for sid not sname. SELECTS.sname FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid ANDB.color=‘red’ INTERSECT SELECTS2.sname FROM Sailors S2, Boats B2, Reserves R2 WHERES2.sid=R2.sid ANDR2.bid=B2.bid ANDB2.color=‘green’
Find the sids of all sailors who have reserved red boats but not green boats. SELECTS.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid ANDB.color=‘red’ EXCEPT SELECTS2.sid FROM Sailors S2, Boats B2, Reserves R2 WHERES2.sid=R2.sid ANDR2.bid=B2.bid ANDB2.color=‘green’ • Sailors 22, 64 and 31 have reserved red boats. Sailors 22, 74 and 31 have reserved green boats. Hence the answer contains just the 64. But do we need all relations in this query???? So do it. SELECT S.sid FROM Boats B, Reserves R WHERE R.bid=B.bid ANDB.color=‘red’ EXCEPT SELECT S2.sid FROM Boats B2, Reserves R2 WHERE R2.bid=B2.bid AND B2.color=‘green’
Find the sids of all sailors who have a rating of 10 or reserved boat 104. SELECTS.sid FROM Sailors S, WHERES.rating = 10 UNION SELECTR.sid FROM Reserves R WHERER.bid = 104; • The first part of the union returns the sids 58 and 71. The second part returns 22 and 31. The answer is, therefore, the set of sids 22, 31, 58 and 71
Nested Queries • Is a query that has another query embedded within it; the embedded query is called subquery. • A very powerful feature of SQL: a WHERE clause can itself contain an SQL query! (Actually, so can FROM and HAVING clauses.) • To understand semantics of nested queries, think of a nested loops evaluation: For each Sailors tuple, check the qualification by computing the subquery.
Nested Queries Find names of sailors who’ve reserved boat #103: SELECT S.sname FROM Sailors S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid=103) • The nested subquery computes the (multi) set of sidsfor sailors who have reserved boat 103 (22,31 and 74 on instances R2 and S3) and the top query retrieve the names of sailors whose sid is in this set. • To find sailors who’ve not reserved #103, use NOT IN. • The best way to understand a nested query is to think of it in terms of conceptual strategy. In our example, the strategy consists of examining rows of Sailors and for each row, evaluating the subquery over Reserves. • Construct the cross-product of the tables in the FROM clause of the top level query as before. For each row in the cross product, recompute the subquery.
Nested Queries Find names of sailors who’ve reserved a red boat: • The subquery might itself contain another nested subquery, in which case we apply the same idea one more time, leading to an evaluation strategy with several levels of nested loop. • The innermost subquery find the set of bids of red boats (102 and 104). The subquery one level above finds the set of sids of sailors who have reserved one of these boats (22,31 and 64). • The top level query finds the names of sailors whose sid is in this set of sids. So we get Dustin, Lubber and Horatio. • Q: SELECT S.sname FROM Sailors S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid IN (Select B.bid FROM Boats B WHERE B.color = ‘red’)
Nested Queries with Correlation • In the nested query, the inner subquery has been completely independent of the outer query. The inner sub query could depend on the rows currently being examined in the outer query. • EXISTSis another set comparison operator, like IN. It test whether a set is nonempty. • Thus for each Sailor row S we test whether the set of Reserves rows R such that R.bid =103 AND S.sid =R.sid is non empty. If so, Sailor S has reserved boat 103, and we retrieve the name. • The occurrence of S in the subquery is called correlation and such queries are called correlated queries. Find names of sailors who’ve reserved boat #103: SELECT S.sname FROMSailorsS WHERE EXISTS (SELECT * FROMReservesR WHERE R.bid=103 ANDS.sid=R.sid)
More on Set-Comparison Operators • We’ve already seen IN, EXISTS and UNIQUE. Can also use NOT IN, NOT EXISTS and NOT UNIQUE. • Also available: opANY, opALL, where op is one of the arithmetic comparison operator • Find sailors whose rating is greater than that of some sailor called Horatio: • What this query will compute ? • On instance Sailors, this computes the sids 31, 32, 58, 71 and 74. • What if the query was: Find sailors whose rating is better than every sailor called Horatio? • Just replace ANY with ALL. What is the result? • Sids 58 and 71. SELECT * FROM Sailors S WHERE S.rating> ANY (SELECT S2.rating FROM Sailors S2 WHERE S2.sname=‘Horatio’)
More on Set-Comparison Operators SELECTS.sid FROM Sailors S WHERES.rating>= ALL(SELECT S2.rating FROM Sailors S2) • Find the sailors with the highest rating • The outer WHERE condition is satisfied only when S.rating is greater than or equal to each of these rating values, that is when it is the largest rating value. • In the instance S3, the condition is satisfied only for rating 10. and the answer is sids: 58 and 71
More on Nested query • Find the names of sailors who have reserved both a red and a green boats. • The query can be understood as follow: Find all sailors who have reserved a red boat and further have sids that are included in the set of sids of Sailors who have reserved a green boats. This formulation of the query illustrates how queries involving INTERSECT can be rewritten using IN • Queries using EXCEPT can be similarly rewritten by using NOT IN. • Find the sids of Sailors who have reserved red boats but not green? SELECT S.sname FROM Sailors S, Reserves R, Boats B WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ AND S.sidIN (SELECT S2.sid FROM Sailors S2, Boats B2, Reserves R2 WHERE S2.sid=R2.sid AND R2.bid=B2.bid AND B2.color=‘green’
COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) Aggregate Operators Count the number of Sailors: SELECT COUNT (*) FROM Sailors S; What is the answer? 10 • It can be used to perform some computation and summarization. single column Find the name and age of oldest Sailor SELECT S.sname, MAX (S.age) FROM Sailors S This query is illegal, we must use GROUP BY Find the average age of sailors with a rating of 10. SELECT AVG (S.age) FROM Sailors S WHERES.rating=10 Find the name and age of oldest Sailor SELECTS.sname, S.age FROM Sailors S WHERES.age=(SELECT MAX(S2.age) FROM Sailors S2)
Find name and age of the oldest sailor(s) SELECT S.sname, MAX (S.age) FROM Sailors S • The first query is illegal! (We’ll look into the reason a bit later, when we discuss GROUP BY.) • The third query is equivalent to the second query, and is allowed in the SQL/92 standard, but is not supported in some systems. SELECTS.sname, S.age FROM Sailors S WHERES.age = (SELECT MAX (S2.age) FROM Sailors S2) SELECTS.sname, S.age FROM Sailors S WHERE (SELECT MAX (S2.age) FROM Sailors S2) = S.age
Motivation for Grouping • So far, we’ve applied aggregate operators to all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples. • Consider:Find the age of the youngest sailor for each rating level. • In general, we don’t know how many rating levels exist, and what the rating values for these levels are! • Suppose we know that rating values go from 1 to 10; we can write 10 queries that look like this (!): SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i For i = 1, 2, ... , 10:
Queries With GROUP BY and HAVING SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification GROUP BYgrouping-list HAVING group-qualification • The target-list contains (i) attribute names(ii) terms with aggregate operations (e.g., MIN (S.age)). • The attribute list (i)must be a subset of grouping-list. Intuitively, each answer tuple corresponds to a group, andthese attributes must have a single value per group. (A group is a set of tuples that have the same value for all attributes in grouping-list.)
Queries With GROUP BY and HAVING • Find the age of the youngest sailor for each rating level: Select S.rating, Min (S.age) From Sailor S Group By S.rating
Conceptual Evaluation • The cross-product of relation-list is computed • Tuplesthat fail qualification are discarded, `unnecessary’ fields are deleted, and • The remaining tuples are partitioned into groups by the value of attributes in grouping-list. • The group-qualification is then applied to eliminate some groups. Expressions in group-qualification must have a single value per group! • In effect, an attribute in group-qualificationthat is not an argument of an aggregate op also appears in grouping-list. (SQL does not exploit primary key semantics here!) • One answer tuple is generated per qualifying group.
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors Sailors instance: SELECTS.rating, MIN (S.age) ASminage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVINGCOUNT (*) > 1 Answer relation:
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors.
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors and with every sailor under 60. HAVING COUNT (*) > 1 AND EVERY (S.age <=60)
Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60. Sailors instance: SELECT S.rating, MIN (S.age) AS minage FROM Sailors S WHERE S.age >= 18 AND S.age <= 60 GROUP BY S.rating HAVINGCOUNT (*) > 1 Answer relation:
For each red boat, find the number of reservations for this boat • Grouping over a join of three relations. • What do we get if we remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition? • What if we drop Sailors and the condition involving S.sid? SELECT B.bid, COUNT (*) AS scount FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid ANDB.color=‘red’ GROUP BY B.bid
Summary • SQL was an important factor in the early acceptance of the relational model; more natural than earlier, procedural query languages. • Relationally complete; in fact, significantly more expressive power than relational algebra. • Even queries that can be expressed in RA can often be expressed more naturally in SQL. • Many alternative ways to write a query; optimizer should look for most efficient evaluation plan. • In practice, users need to be aware of how queries are optimized and evaluated for best results.
End chapter exercises Consider the following schema: Flights(flno: integer, from: string, to: string, distance: integer, departs: time, arrives: time, price: real) Aircraft(aid: integer, aname: string, cruisingrange: integer) Certified(eid: integer, aid: integer) Employees(eid: integer, ename: string, salary: integer) Note that the Employees relation describes pilots and other kinds of employees as well; every pilot is certified for some aircraft, and only pilots are certified to fly. Write each of the following queries in SQ: 1. Find the names of aircraft such that all pilots certified to operate them have salaries more than $80,000. 2. For each pilot who is certified for more than three aircraft, find the eid and the maximum cruisingrange of the aircraft for which she or he is certified. 3. Find the names of pilots whose salary is less than the price of the cheapest route from Los Angeles to Honolulu. 4. For all aircraft with cruisingrange over 1000 miles, find the name of the aircraft and the average salary of all pilots certified for this aircraft. 6. Find the aids of all aircraft that can be used on routes from Los Angeles to Chicago
End chapter excersises An Instance of Sailors Write SQL queries to compute the average rating, using AVG; the sum of the ratings, using SUM; and the number of ratings, using COUNT.