510 likes | 603 Vues
This detailed guide covers advanced SQL concepts such as cartesian product, theta join, natural join, duplicate elimination, aggregation, and grouping. Learn how to manipulate data effectively in SQL queries.
E N D
SQL: Structured Query Language Part II Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu
Cartesian Product in SQL • In Relation Algebra: R x S • In SQL, add R and S to FROM clause • No WHERE condition that links R and S SELECT* FROMStudent, Professor; SELECTsName, pNumber FROMStudent, Professor;
Cross Product - Example Student Professor SELECT* FROMStudent, Professor;
Theta Join in SQL • In Relation Algebra: R ⋈C S • In SQL, add R and S to FROM clause • WHERE condition that links R and S with the join condition C SELECT* FROMStudent, Professor WHEREStudent.pNum = Professor.Number; Join condition
Theta Join Example Student Professor SELECTsNumber, sName, pName FROMStudent, Professor WHEREprofNum = pNumber; sNumber,sName,pName(Student ⋈(profNum=pNumber) Professor)
Theta Join Example Student Professor If column names are the same use relationName.attrName SELECTsName, pName, S.address FROMStudent S, Professor P WHERES.address = P.address; sName,pName,S.address(ρS(Student) ⋈(S.address=P.address)ρP(Professor))
Natural Join Reminder: Join columns must have same names in both relations (R ⋈ S) SELECT * FROM Student , Professor WHERE Student.pnumber = Professor.pnumber ; Student ⋈ Professor SELECT * FROM Student NATURAL JOIN Professor; Explicitly add the equality join condition
Difference between the two Queries below SELECT * FROM Student , Professor WHERE Student.pnumber = Professor.pnumber ; Common columns will appear once Student ⋈ Professor SELECT * FROM Student NATURAL JOIN Professor; Common columns will appear twice Explicitly add the equality join condition
Natural Join - Example Must be the same name Professor Student SELECT * FROM Student natural join Professor; Student ⋈ Professor
Example Queries SELECT * FROMloan WHEREamount > 1200 ; SELECTL.loan_number FROMloan L WHEREL.amount > 1200 ;
Example Queries SELECTcustomer_name FROMdepositor Union SELECTcustomer_name FROMborrower;
Example Queries DBMS is smart enough !!! (Select first, then joins) SELECT customer_name FROMborrower B, loan L WHERE B.loan_number = L.loan_number ANDL.branch_name = “Perryridge”;
Sorting: ORDER BY clause • New optionalclause that you can add to the SELECT statement called “ORDER BY” • Allows sorting the returned records according to one or more fields SELECT* FROMStudent WHEREsNumber >= 1 ORDER BY pNumber, sName; Default is ascending order -Order first based on the pNumber (ascending) -If many records exist with the same pNumber - order them based on sName (descending) SELECT* FROMStudent WHEREsNumber >= 1 ORDER BY pNumberASC, sNameDESC;
Sorting: ORDER BY clause Student SELECT* FROMStudent WHEREsNumber >= 1 ORDER BY pNumber, sName DESC; (pNumber, sName DESC) ( (sNumber >= 1) (Student))
Duplicate Elimination in SQL • New optionalkeyword “DISTINCT” • Added in the SELECT clause SELECTDISTINCT… FROM… … Eliminate any duplicates from the answer
Duplicate Elimination: Example Student SELECTDISTINCTsName, address FROMStudent; • (sName,address(Student)) • ( (address) ( (sNumber > 1) (Student))) SELECTDISTINCTaddress FROMStudent WHEREsNumber > 1;
Always Remember…. • Only SELECT and FROM clauses are mandatory • All the others are optional • You can mix and match the optional ones • But if you add a clause, then keep it in its order SELECTDISTINCTaddress FROMStudent WHEREsNumber > 1; SELECTaddress FROMStudent ORDER BY sNumber; SELECTaddress FROMStudent WHEREsNumber > 1 ORDER BY sNumber; SELECTaddress FROMStudent ORDER BY sNumber WHEREsNumber > 1; X
Possible Aggregations in SQL SELECT COUNT(*) FROM Student; SELECT COUNT(sNumber) FROM Student; SELECT MIN(sNumber) FROM Student; SELECT MAX(sNumber) FROM Student; SELECT SUM(sNumber) FROM Student; SELECT AVG(sNumber) FROM Student;
Grouping & Aggregation in SQL • New optionalclause called “GROUP BY” • If the SELECT statement has “WHERE” • Then WHERE conditions are evaluated first, then records are grouped And get the minimum gpa for each group SELECTpNumber, COUNT(sName), Min(gpa) FROMStudent GROUP BY pNumber; Then count the records in each group First form groups for each pNumber
GROUP BY: Example I Student cnt count(*) (Student) pNumber,cntcount(*) ( (sNumber > 1) (Student)) SELECTpNumber, count(*) AS CNT FROMStudent WHEREsNumber > 1 GROUP BYpNumber; SELECTcount(*) AS CNT FROMStudent;
GROUP BY: Example II Student pNumber,address, CNT count(sName), SUM sum(sNumber) ( (sNumber > 1) (Student)) SELECTpNumber,address, count(sName) AS CNT, sum(sNumber) ASSUM FROMStudent WHEREsNumber >= 1 GROUP BYpNumber, address;
Restrictions of GROUP BY • If you group by A1, A2, …An, then any other column projected in SELECT clause must be inside an aggregation function SELECTpNumber, address, count(sName) AS CNT, sum(sNumber) ASSUM FROMStudent WHEREsNumber > 1 GROUP BYpNumber, address; SELECTpNumber, address, sName, sum(sNumber) ASSUM FROMStudent WHEREsNumber > 1 GROUP BYpNumber, address; X SELECTpNumber, count(sName) AS CNT, sum(sNumber) ASSUM FROMStudent WHEREsNumber > 1 GROUP BYpNumber, address;
HAVING Clause: Putting Condition on Groups • How to add conditions on each group? • Select only the groups where the COUNT > 5 • These conditions are after you build the groups (not before) • Remember: WHERE conditions are executed before the groups are formed • New optionalclause called “HAVING”, added after the GROUP BY clause SELECTpNumber, COUNT(sName) FROMStudent GROUP BY pNumber HAVING SUM(sNumber) > 2; Can reference aggregation inside HAVING
HAVING Clause: Example Student (SUM> 3) (pNumber,address, CNT count(sName), SUM sum(sNumber) ( (sNumber > 1) (Student))) SELECTpNumber,address, count(sName) AS CNT, sum(sNumber) ASSUM FROMStudent WHEREsNumber > 1 GROUP BYpNumber, address HAVING sum(sNumber) > 3;
SELECT Statement Clauses SELECT<projection list> FROM<relation names> WHERE <conditions> GROUP BY <grouping columns> HAVING <grouping conditions> ORDER BY <order columns>; • Optional clauses if added must be in the order above • Order of execution • FROM Check which relations are used • WHERE Filter records based on conditions • GROUP BY Form groups • HAVING Filter groups based on conditions • ORDER BY Sort the data • SELECT Form the projection list (output columns) optional
Questions SELECT<projection list> FROM<relation names> WHERE <conditions> GROUP BY <grouping columns> HAVING <grouping conditions> ORDER BY <order columns>; optional
More in SELECT Statement • Special handling for NULL values • Nested subqueries
Null Values • Null means ‘unknown’ value • Any expression containing Null returns Null • 5 + null null • ‘ABC’ || null null • Null in predicates returns UNKNOWN • Predicates usually return TRUE or FALSE
Example Student SELECTsNumber FROMStudent WHERE address = ‘320FL’; May or may not appear
Use of “IS NULL” or “IS NOT NULL” • Check if a value is null or not SELECTsNumber FROMStudent WHERE address is null; Select student numbers where the address is null SELECTsNumber FROMStudent WHERE address is not null AND address ‘320FL’; Remember: SELECTsNumber FROMStudent WHERE address = null; X The returned value here is unknown
Use of “NVL” Function • NVL( exp1, exp2) • If exp1 is null return exp2, otherwise return expr1 • Can be used in projection list or in predicates SELECTsNumber FROMStudent WHERE nvl(address, ‘n/a’) <> ‘n/a’ AND address ‘320FL’; SELECTsNumber, nvl(address, ‘N/A’) FROMStudent;
Null with Grouping & Aggregation • Aggregation • Null is ignored with all aggregates, e.g., SUM, AVG, MIN, MAX except COUNT(*) • Grouping • Null is considered as a separate group
Example Student SELECTaddress, sum(pNumber) as sum, count(*) as cnt FROMStudent GROUP BY address;
More in SELECT Statement • Special handling for NULL values • Nested subqueries
Nested Subquery • SQL provides a mechanism for the nesting of subqueries. • A subqueryis a SELECT statement expression that is nested within another query • Subquery can appear inFROM or WHEREclauses
Nested Subquery in WHERE Clause • Since the predicates has = : • The inner statement must return one record with one column • In this case, DBMS will automatically convert the relation to a single scalar value • Otherwise an error is generated 2- Then, execute this statement once pNumber from the first step is known (outer SELECT) SELECT* FROM Student WHERE pNumber = (SELECT pNumber FROM Professor WHERE pName = ‘Mike’); 1- Execute this statement first to get the pNumber (inner SELECT)
Example: Subqueries Retuning Scalar Value Student Professor Select students of professor ‘MM’ SELECT sNumber, sName FROM Student WHERE pNum = (SELECT pNumber FROM Professor WHERE pName=‘MM’); CS3431
SubQuery Returning a Relation (General Case) • Predicates may include any of (OP above) : • Exists R True if R is not empty • s in R True if tuple s appears in R • s not in R True if tuple s does not appear in R Outer Select (S) SELECT sNumber, sName FROM Student WHERE pNum OP (SELECT pNumber FROM Professor WHERE pName=‘MM’); Inner Select (R)
Example 1: Subqueries Returning Relations Student Professor Select students of professors with address like ‘%FL’ SELECT sNumber, sName FROM Student WHERE pNum IN (SELECT pNumber FROM Professor WHERE address Like ‘%FL’); CS3431
Example 2: Subqueries Returning Relations Student Professor SELECT sNumber, sName FROM Student WHEREExists (SELECT pNumber FROM Professor WHERE address Like ‘%FL’); Always true because it is not empty CS3431
Example 3: Subqueries Returning Relations Multi-column tuples
Comparison Using ALL and ANY • We took: Exists, IN, NOT IN • s > ALL R True if s > all values in R • s > ANY R True if s > any value in R • ‘>’ can be any of the other comparison operators, e.g., <, <=, >=, =, <> • R must be relation with single column Outer Select (S) SELECT sNumber, sName FROM Student WHERE pNum OP (SELECT pNumber FROM Professor WHERE pName=‘MM’); Inner Select (R)
Example Student Professor SELECT sNumber, sName FROM Student WHERE pNum >= ALL (SELECT pNumber FROM Professor WHERE address Like ‘%FL’); This inner select returns 1 , 2
Correlated Selects • If the “inner” select references the “outer” select correlated • In this case, the inner select is executed with each record from the outer select Reference to the outer select Meaning: For each supplier, execute the inner select, and then evaluate the WHERE clause Returns: suppliers who do not have orders
Use of Inner Select with DML Commands • Inner select can be used with Insert, Update, Delete commands • INSERT INTO suppliers (supplier_id, supplier_name) • SELECTaccount_no, name • FROMexternals • Wherecode = 1; Notice that there is no keyword “values” in this case
Nested Subquery in FROM Clause • Use the inner SELECT like any other table • It is just built on the fly • Inner SELECT can be a full statement with all clauses • ORDER BY clause does not make sense in the inner select SELECT* FROM Student, (inner SELECT) AS q WHERE … Table built on the fly
Example • Subquery 1 is computed on the fly • It is treated as a normal table after that