1 / 28

CS 405G: Introduction to Database Systems

Learn about set and bag operations in SQL, including union, intersect, except, and operational semantics of SFW.

trujillow
Télécharger la présentation

CS 405G: Introduction to Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2017

  2. Review • SELECTa list of attributes FROM a list of relations WHERE condition; • Condition may have logical operators AND, OR, NOT • Condition may have comparison operators: <. <=, <>, >=, > • String comparison may use “=” (exactly match) or “LIKE” (matching with regular expressions) • %, _, \ • (Arithmetic) expressions of attributes are allowed Jinze Liu @ University of Kentucky

  3. Exercise • SELECTsid, 2007 – age, FROM STUDENT WHERE name LIKE ‘%John%’ OR GPA > 3.6; asdfsadf Jinze Liu @ University of Kentucky

  4. Relational Database Student (SID integer, name varchar(30), age integer, GPA float); Course (CID char(10), title varchar(100)); Enroll (SID integer, CID char(10)); Create table Student (SID integer, name varchar(30), age integer, GPA float); Create table Course (CID char(10), title varchar(100)); Create table Enroll (SID integer, CID char(10)); Jinze Liu @ University of Kentucky

  5. Today’s Topic • Set and Bag operation • UNION, • INTERSECTION, • EXCEPT • DISTINCT • Aggregation • HAVING • Nested queries SELECT age, AVG(GPA) FROM Student S ENROLL E WHERE S.SID = E.SID AND E.CID = ‘EECS108’ GROUP BY age HAVING age > 20; --Compute the average GPA for Students who are at least 20 years old and are enrolled in 108 with the same age Jinze Liu @ University of Kentucky

  6. SQL set and bag operations • UNION, EXCEPT, INTERSECT • Set semantics • Duplicates in input tables, if any, are first eliminated • Exactly like set union, - and intersect in relational algebra • UNION ALL, EXCEPT ALL, INTERSECT ALL • Bag semantics • Think of each row as having an implicit count (the number of times it appears in the table) • Bag union: sum up the counts from two tables • Bag difference: subtract the two counts (a row with negative count vanishes) • Bag intersection: take the minimum of the two counts Jinze Liu @ University of Kentucky

  7. Examples of bag operations Bag1 Bag2 Bag1 UNION ALL Bag2 Bag1 INTERSECT ALL Bag2 Bag1 EXCEPT ALL Bag2 Jinze Liu @ University of Kentucky

  8. Exercise • Enroll(SID, CID), ClubMember(club, SID) • (SELECT SID FROM ClubMember)EXCEPT(SELECT SID FROM Enroll); • SID’s of students who are in clubs but not taking any classes • (SELECT SID FROM ClubMember)EXCEPT ALL(SELECT SID FROM Enroll); • SID’s of students who are in more clubs than classes Jinze Liu @ University of Kentucky

  9. Forcing set semantics • SQL provides the option of set semantics with DISTINCT keyword • SID’s of all enrolled students SELECT SID, FROM Enroll, • Say Bart takes EECS700 and EECS647 SELECT DISTINCT SID, FROM Enroll, • With DISTINCT, all duplicate SIDs are removed from the output Jinze Liu @ University of Kentucky

  10. Operational Semantics of SFW • SELECT [DISTINCT]E1, E2, …, EnFROM R1, R2, …, RmWHERE condition; • For each t1 in R1: For each t2 in R2: … … For each tm in Rm: If condition is true over t1, t2, …, tm: Compute and output E1, E2, …, En as a rowIf DISTINCT is present Eliminate duplicate rows in output • t1, t2, …, tm are often called tuple variables Jinze Liu @ University of Kentucky

  11. Summary of SQL features • SELECT-FROM-WHERE statements (select-project-join queries) • Renaming operation • Set and bag operations • UNION, DIFFERENCE, INTERSECTION • Next: aggregation and grouping Jinze Liu @ University of Kentucky

  12. Aggregates • Standard SQL aggregate functions: COUNT, SUM, AVG, MIN, MAX • Example: number of students under 18, and their average GPA • SELECT COUNT(*), AVG(GPA)FROM StudentWHERE age < 18; • COUNT(*) counts the number of rows Jinze Liu @ University of Kentucky

  13. Aggregates with DISTINCT • Example: How many students are taking classes? • SELECT COUNT (SID)FROM Enroll; • SELECT COUNT(DISTINCT SID)FROM Enroll; Jinze Liu @ University of Kentucky

  14. GROUPBY • SELECT … FROM … WHERE …GROUP BY list_of_columns; • Example: find the average GPA for each age group • SELECT age, AVG(GPA)FROM StudentGROUP BY age; Jinze Liu @ University of Kentucky

  15. Operational semantics of GROUPBY SELECT … FROM … WHERE … GROUP BY …; • Compute FROM • Compute WHERE • Compute GROUPBY: group rows according to the values of GROUPBY columns • Compute SELECT for each group • For aggregation functions with DISTINCT inputs, first eliminate duplicates within the group • Number of groups = number of rows in the final output Jinze Liu @ University of Kentucky

  16. Compute GROUPBY: group rows according to the values of GROUPBY columns Compute SELECT for each group Example of computing GROUPBY SELECT age, AVG(GPA) FROM Student GROUP BY age; Jinze Liu @ University of Kentucky

  17. Compute aggregateover the group Group all rowsinto one group Aggregates with no GROUPBY • An aggregate query with no GROUPBY clause represent a special case where all rows go into one group SELECT AVG(GPA) FROM Student; Jinze Liu @ University of Kentucky

  18. Restriction on SELECT • If a query uses aggregation/group by, then every column referenced in SELECT must be either • Aggregated, or • A GROUPBY column • This restriction ensures that any SELECT expression produces only one value for each group Jinze Liu @ University of Kentucky

  19. Examples of invalid queries • SELECT SID, age FROM Student GROUP BY age; • Recall there is one output row per group • There can be multiple SID values per group • SELECT SID, MAX(GPA) FROM Student; • Recall there is only one group for an aggregate query with no GROUPBY clause • There can be multiple SID values • Wishful thinking (that the output SID value is the one associated with the highest GPA) does NOT work Jinze Liu @ University of Kentucky

  20. HAVING • Used to filter groups based on the group properties (e.g., aggregate values, GROUPBY column values) • SELECT … FROM … WHERE … GROUP BY …HAVING condition; • Compute FROM • Compute WHERE • Compute GROUPBY: group rows according to the values of GROUPBY columns • Compute HAVING (another selection over the groups) • Compute SELECT for each group that passes HAVING Jinze Liu @ University of Kentucky

  21. HAVING examples • Find the average GPA for each age group over 10 • SELECT age, AVG(GPA)FROM StudentGROUP BY ageHAVING age > 10; • Can be written using WHERE without table expressions • List the average GPA for each age group with more than a hundred students • SELECT age, AVG(GPA)FROM StudentGROUP BY ageHAVING COUNT(*) > 100; • Can be written using WHERE and table expressions Jinze Liu @ University of Kentucky

  22. Summary of SQL features covered so far • SELECT-FROM-WHERE statements • Renaming operation • Set and bag operations • Aggregation and grouping • Next: Table expressions, subqueries • Table • Scalar • In • Exist Jinze Liu @ University of Kentucky

  23. Table expression • Use query result as a table • In set and bag operations, FROM clauses, etc. • A way to “nest” queries • Example: names of students who are in more clubs than classes SELECT DISTINCT nameFROM Student, ( ) AS SWHERE Student.SID = S.SID; (SELECT SID FROM ClubMember)EXCEPT ALL(SELECT SID FROM Enroll) Jinze Liu @ University of Kentucky

  24. Scalar subqueries • A query that returns a single row can be used as a value in WHERE, SELECT, etc. • Example: students at the same age as Bart SELECT *FROM StudentWHERE age = ( ); What’s Bart’s age? SELECT ageFROM StudentWHERE name = ’Bart’ • Runtime error if subquery returns more than one row • Under what condition will this runtime error never occur? • name is a key of Student • What if subquery returns no rows? • The value returned is a special NULL value, and the comparison fails Jinze Liu @ University of Kentucky

  25. IN subqueries • x IN (subquery) checks if x is in the result of subquery • Example: students at the same age as (some) Bart SELECT *FROM StudentWHERE age IN ( ); What’s Bart’s age? SELECT ageFROM StudentWHERE name = ’Bart’ Jinze Liu @ University of Kentucky

  26. EXISTS subqueries • EXISTS (subquery) checks if the result of subquery is non-empty • Example: students at the same age as (some) Bart • SELECT *FROM Student AS sWHERE EXISTS (SELECT * FROM Student WHERE name = ’Bart’ AND age = s.age); • This happens to be a correlated subquery—a subquery that references tuple variables in surrounding queries Jinze Liu @ University of Kentucky

  27. Operational semantics of subqueries • SELECT *FROM Student AS sWHERE EXISTS (SELECT * FROM Student WHERE name = ’Bart’ AND age = s.age); • For each row s in Student • Evaluate the subquery with the appropriate value of s.age • If the result of the subquery is not empty, output s.* • The DBMS query optimizer may choose to process the query in an equivalent, but more efficient way (example?) Jinze Liu @ University of Kentucky

  28. Summary of SQL features covered so far • SELECT-FROM-WHERE statements • Ordering • Set and bag operations • Aggregation and grouping • Table expressions, subqueries • Next: NULL’s, outerjoins, data modification, constraints, … Jinze Liu @ University of Kentucky

More Related