280 likes | 389 Vues
Steven Emory Department of Computer Science California State University, Los Angeles. Using Relational Databases and SQL. Chapter 6 Set Functions. Topics for Today. Set (Aggregate) Functions GROUP BY Clause HAVING Clause. Set Functions. Definition
E N D
Steven Emory Department of Computer Science California State University, Los Angeles Using Relational Databases and SQL Chapter 6 Set Functions
Topics for Today • Set (Aggregate) Functions • GROUP BY Clause • HAVING Clause
Set Functions • Definition • A set function, or group aggregate function, is a function that operates on groups
Aggregate Functions • Aggregate/Non-aggregate similarities • Both take some kind of input • Both perform operations using the input • Both have an single output. • Aggregate/Non-aggregate differences • Input to an aggregate function is a group of data • Input to a non-aggregate function is a single item • Aggregate functions may not be nested • Aggregate functions do not alter any table data
Examples • Function Example: • SELECT LEFT(Title, 1)FROM Movies; • Set Function Example: • SELECT MPAA, COUNT(MPAA)FROM MoviesGROUP BY MPAA;
Aggregate Functions • There are only 5 general aggregate functions • COUNT(*), COUNT(fieldname) • AVG(fieldname) • MIN(fieldname) • MAX(fieldname) • SUM(fieldname)
COUNT • COUNT(*) • Counts the number of rows in a table • Excludes NULLs (doesn't count them) • -- This query returns 6.SELECTCOUNT(*) AS 'Number of Movies'FROM Movies; • COUNT(fieldname) • Same as above • -- This query also returns 6.SELECTCOUNT(ArtistID) AS 'Number of Movies'FROM Movies;
AVG • AVG(fieldname) • Averages all the data under fieldname • Excludes NULLs (doesn't count NULL as 0). • -- Averages all movie runtimes.SELECTAVG(Runtime) AS 'Average Runtime'FROM Movies;
MIN and MAX • MIN(fieldname) • Returns the minimum value under fieldname • -- Returns the minimum movie runtime.SELECTMIN(Runtime) AS 'Shortest Runtime'FROM Movies; • MAX(fieldname) • Returns the maximum value under fieldname • -- Returns the maximum movie runtime.SELECTMAX(Runtime) AS 'Longest Runtime'FROM Movies;
SUM • SUM(fieldname) • Sums all the data under fieldname • Excludes NULLs (doesn't count NULL as 0). • -- Sums all of the movie runtimes.SELECTSUM(Runtime) AS 'Total Runtime'FROM Movies;
Filtering Aggregate Calculations • To exclude items from being aggregated, you may use the WHERE clause. • Example: Count the number of PG-13 movies.SELECTCOUNT(*)FROM MoviesWHERE MPAA = 'PG-13'; • Example: Count the number of rated R movies.SELECTCOUNT(*)FROM MoviesWHERE MPAA = 'R';
Mixing Field Types • Can we calculate both with a single query? • +-------+----------+| MPAA | COUNT(*) |+-------+----------+| PG-13 | 5 || R | 1 |+-------+----------+2 rows in set (0.01 sec) • Well, we would need to mix non-aggregated fieldnames with aggregated ones • -- Example: What does this do? Does it work? No!SELECT MPAA, COUNT(MPAA)FROM Movies;
Grouping Tables • Solution: You can divide the table into groups. • -- Groups the movies table by MPAA rating.SELECT MPAAFROM MoviesGROUPBY MPAA; • -- Groups and counts movies by MPAA rating.SELECT MPAA, COUNT(MPAA)FROM MoviesGROUPBY MPAA;
How GROUP BY Works • GROUP BY begins by sorting the table based on the grouping attribute (in our case, Gender) • If any aggregates are present, GROUP BY causes each aggregate to be applied per-group rather than per-table • GROUP BY then condenses the table so that each group only appears once in the table (if listed) and displays any aggregated group values along with it
Grouping on Multiple Fields • GROUP BY can use multiple fieldnames (similar to how you can sort using multiple fieldnames) • -- Example: Report the number of movies by MPAA rating and year of release.SELECT MPAA, YEAR(ReleaseDate), COUNT(*)FROM MoviesGROUPBY MPAA, YEAR(ReleaseDate); • In the SELECT clause that contains one or more aggregates, you should only list table attributes that are als
Filtering Based on Aggregates • Can we use aggregate functions in the WHERE clause? • -- List all genres that have an average movie runtime of over 2 hours.SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID)WHEREAVG(Runtime) > 120GROUP BY Genre; • The answer is no because WHERE filters during aggregation! We need something that filters after!
The HAVING Clause • Solution is to use the HAVING clause • Example: • -- List all genres that have an average movie runtime of over 2 hours.SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID)GROUP BY GenreHAVINGAVG(Runtime) > 120;
How HAVING Works • In previous example: • This is calculated first...SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID)GROUP BY Genre; • Then the result is filtered using the HAVING clause...SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID)GROUP BY GenreHAVINGAVG(Runtime) > 120;
How HAVING Works • So in other words: • WHERE filters per row (DURING aggregation) • HAVING filters per group (AFTER aggregation) • Since HAVING filters on groups: • You cannot use just any fieldname you want to in the SELECT or HAVING clause with an aggregate query; you can only the use ones you choose to group by • Example on next page...
Having Examples • This works: • SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY GenreHAVINGAVG(Runtime) > 120; • This doesn't work: • SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY GenreHAVINGAVG(Runtime) > Runtime; • HAVING only sees group attributes and aggregates.
Having Examples • Why doesn't it work? • Because Runtime is an attribute of a movie and not an attribute of a group. You can only use group attributes and aggregate functions in a HAVING clause. • Since Genre is an attribute of the aggregated group (Genre is listed in the GROUP BY clause), we can use it in the HAVING clause. • SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY GenreHAVING (AVG(Runtime) > 120AND Genre <> ‘Horror’);
HAVING Summary • So in a HAVING clause: • You can use aggregate functions • You can use constant values • You can use grouping attributes • Anything else and... • Happy error time! • Usually “ERROR 1111 (HY000): Invalid use of group function” or “ERROR 1054 (42S22): Unknown column 'column name' in having clause” are the most common errors.
An Advanced HAVING Problem • List the country and average age of all (movie-related) people born in that country, for only those countries that have an average person age greater than 50. Remember that nobody every says “I'm 52.3948279 years old!” Always truncate ages to zero decimal places.
Solution • SELECTBirthCountry, TRUNCATE(AVG(TRUNCATE(DATEDIFF(CurDate(), BirthDate)/365, 0)), 0) AS 'Average Age'FROM PeopleGROUPBYBirthCountryHAVINGTRUNCATE(AVG(TRUNCATE(DATEDIFF(CurDate(), BirthDate)/365, 0)), 0) > 50 ANDBirthCountryIS NOT NULL;
Solution • Note that you may also define an alias for the aggregate function in MySQL and use it in the HAVING clause • SELECTBirthCountry, TRUNCATE(AVG(TRUNCATE(DATEDIFF(CurDate(), BirthDate)/365, 0)), 0) ASAverageAgeFROM PeopleGROUPBYBirthCountryHAVINGAverageAge > 50 ANDBirthCountryIS NOT NULL;
Aggregating Distinct Values • A normal SELECT DISTINCT query filters out duplicates in a second pass • Aggregates are computed in the first pass, so if a field contains duplicate values, and you aggregate on that field, SELECT DISTINCT WILL NOT filter out duplicate values from being aggregated. • The solution is to use the DISTINCT keyword in the aggregate function: • SELECTCOUNT(DISTINCT MPAA)FROM Movies;
Aggregating Distinct Values • Example: • -- Returns 6 since there are 6 movies.SELECTCOUNT(MPAA)FROM Movies; • -- Returns 6 since there are 6 movies and 6 is unique.SELECT DISTINCTCOUNT(MPAA)FROM Movies; • -- Returns 2 since only PG-13 and R rated movies are currently in the database.SELECT COUNT(DISTINCT MPAA)FROM Movies;