400 likes | 534 Vues
Explore the essentials of Top-N querying patterns in SQL as presented by Maxym Kharchenko. This guide walks you through practical examples including queries to retrieve the top 10 salaries in the Sales department, best-selling books, and the latest orders. You will learn about naive versus optimized approaches, indexing strategies, and how to effectively use pagination to enhance performance. Discover the structure of SQL queries like SELECT, ORDER BY, and how to filter with conditions to get the desired results while managing large datasets.
E N D
SQL Top-Nand Pagination Pattern Maxym Kharchenko
What is top-N • Give me the top 10 salaries in the “Sales” dept • Give me the top 10 best selling books • Give me the 10 latest orders
Setup SQL> @desc cities Name Null? Type -------------------------- -------- --------------- NAME NOT NULL VARCHAR2(100) STATE NOT NULL VARCHAR2(100) POPULATION NOT NULL NUMBER PCTFREE 99 PCTUSED 1 http://www.census.gov
Naïve Top-N Give me the top 5 cities by population SELECT name, population FROM cities WHERE rownum <= 5 ORDER BY population DESC; NAME Pop ---------------------- ------ Robertsdale city 5,276 Glen Allen town (pt.) 458 Boligee town 328 Riverview town 184 Altoona town (pt.) 30 Statistics 7 consistent gets
Naïve Top-N explained ----------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Time | ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 110 | 00:00:01 | | 1 | SORT ORDER BY | | 5 | 110 | 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | 3 | TABLE ACCESS FULL| CITIES | 10 | 220 | 00:00:01 | -----------------------------------------------------------------
Correct top-N query SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC ) WHERE rownum <= 5; SELECT name, population FROM cities ORDER BY population DESC FETCH FIRST 5 ROWS ONLY; <= 11g >= 12c
Correct top-N query: Execution SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC ) WHERE rownum <= 5; NAME Pop -------------------- ---------- Los Angeles city 3,792,621 Chicago city (pt.) 2,695,598 Chicago city (pt.) 2,695,598 Chicago city 2,695,598 New York city (pt.) 2,504,700 Statistics 56024 consistent gets
Reading, filtering and sorting --------------------------------------------------------------------- | Id | Operation | Name | Rows |TempSpc| Time | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | | 00:01:58 | |* 1 | COUNT STOPKEY | | | | | | 2 | VIEW | | 56072 | | 00:01:58 | |* 3 | SORT ORDER BY STOPKEY| | 56072 | 1768K| 00:01:58 | | 4 | TABLE ACCESS FULL | CITIES | 56072 | | 00:01:54 | ---------------------------------------------------------------------
Proper data structure Ordered By: Population CREATE INDEX i_pop ON cities(population); -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 56072 | 00:00:01 | | 4 | INDEX RANGE SCAN DESCENDING| I_POP | 10 | 00:00:01 | -------------------------------------------------------------------- Statistics 12 consistent gets
Why index works Ordered By: Population • Colocation • Can stop after reading N rows • No Sort CREATE INDEX i_pop ON cities(population);
More elaborate top-N Give me the top 5 cities by population in Florida SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 5; NAME Pop -------------------- ---------- Jacksonville city 821,784 Miami city 399,457 Tampa city 335,709 St. Petersburg city 244,769 Orlando city 238,300 Statistics 264 consistent gets
Uncertain nature of filtering Ordered By: Population WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 5; WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 200; Statistics 264consistent gets Statistics 19747consistent gets
Multi column indexes CREATE INDEX i_state_pop ON cities(state, population); where state=‘FL’ AL AK AZ CO FL MA WA State Population • Ordered By: • State • State+Population • Not Ordered by: Population WHERE state=‘FL’ NOW: Ordered By: Population
Multicolumn indexes ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 1099 | 00:00:01 | |* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP | 11 | 00:00:01 | ------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter(ROWNUM<=5) 4 - access("STATE"='Florida') Statistics 12 consistent gets
Trips to the table ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 1099 | 00:00:01 | |* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP | 11 | 00:00:01 | ------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter(ROWNUM<=5) 4 - access("STATE"='Florida') Statistics 12 consistent gets
Index range scan: cost math Window: 500 records 4-5 logical reads ~ 5-10 logical reads ~ 10-500 logical reads
Covering index CREATE INDEX i_state_pop ON cities(state, population); CREATE INDEX i_state_pop_c ON cities(state, population, name); Statistics 12 consistent gets Statistics 7 consistent gets -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I_STATE_POP_C | 506 | 00:00:01 | --------------------------------------------------------------------------
Ideal top-N • Use the index • Make the best index • And read only from the index
Less than ideal top-N • Effect of query conditions • Effect of deletes and updates • Technicalities
Condition better! WHERE active != 'N' ORDER BY order_date DESC ) WHERE rownum <= 10; CREATE TABLE orders ( … active char(1) NOT NULL CHECK (active IN ('Y', 'N')) WHERE active = 'Y' ORDER BY order_date DESC ) WHERE rownum <= 10; Statistics 12345 consistent gets Statistics 10 consistent gets
Trade WHERE for ORDER BY CREATE INDEX t_idx ON t(a, b, c); SELECT * FROM (SELECT * FROM t WHERE a=12 ORDER BY c) ) WHERE rownum <= 10; Statistics 1200 consistent gets WHERE a=12 ORDER BY c Statistics 12 consistent gets WHERE a=12 ORDER BY b, c Statistics 12 consistent gets WHERE a=12 AND b=0 ORDER BY c
Tolerate filtering SELECT * FROM ( SELECT name, population FROM cities WHERE state != 'Florida' ORDER BY population DESC ) WHERE rownum <= 10; Statistics 28 consistent gets
Tolerate filtering -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | |* 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 55566 | 00:00:01 | | 4 | INDEX RANGE SCAN DESCENDING| I_POP | 12 | 00:00:01 | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=10) 3 - filter("STATE"<>'Florida')
Updates and Deletes SQL> @desc cities2 Name Null? Type ---------------------- -------- ---------------- NAME NOT NULL VARCHAR2(100) STATE NOT NULL VARCHAR2(100) POPULATION NOT NULL NUMBER BUDGET_SURPLUS NOT NULL VARCHAR2(1) CREATE INDEX i2_pop ON cities2(budget_surplus, population, name);
Updates and Deletes SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' ORDER BY population DESC ) WHERE rownum <= 5; ------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 12 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 56067 | 00:00:01 | ------------------------------------------------------------------- Statistics 7 consistent gets
Updates and Deletes UPDATE cities2 SET budget_surplus='N' WHERE rowid IN ( SELECT * FROM ( SELECT rowid FROM cities2 ORDER BY population DESC ) WHERE rownum <= 200); ------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 12 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 56067 | 00:00:01 | ------------------------------------------------------------------- Statistics 207 consistent gets
Updates and Deletes ALTER TABLE cities2 ADD (version number default 0 NOT NULL); CREATE INDEX i2_vpop ON cities2(budget_surplus, version, population); UPDATE cities2 SET version=1WHERE budget_surplus='Y' AND version=0; Budget_surplus Y Y Budget_surplus Version 0 Y 1 Population
Updates and Deletes SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' AND version=1 ORDER BY population DESC ) WHERE rownum <= 5; -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 1 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_VPOP | 1 | 00:00:01 | -------------------------------------------------------------------- Statistics 9 consistent gets
Pagination SELECT * FROM ( SELECT * FROM ( SELECT name, population, rownum AS rn FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 20 ) WHERE rn > 10; SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 10;
Dumb Pagination ) WHERE rownum <= 20 ) WHERE rn > 10; Statistics 22 consistent gets ) WHERE rownum <= 30 ) WHERE rn > 20; Statistics 32 consistent gets
Smart pagination SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' AND population < 154750 ORDER BY population DESC ) WHERE rownum <= 10; SELECT * FROM ( SELECT * FROM ( SELECT name, population, rownum AS rn FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 20 ) WHERE rn > 10; Statistics 22 consistent gets Statistics 12 consistent gets
Top-N with joins SELECT * FROM ( SELECT c.name as city, c.population, s.capital FROM cities c, states s WHERE c.state_id = s.id AND c.state='Florida' ORDER BY c.population DESC ) WHERE rownum <= 5 / Driving table: Filter state Order By population Join state_id Select name Joined to table: Join id Select capital Use Nested Loops! Build indexes like this!
Top-N with joins: Good ------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:13 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:13 | | 3 | NESTED LOOPS | | 10 | 00:00:13 | |* 4 | INDEX RANGE SCAN| I_C | 506 | 00:00:07 | |* 5 | INDEX RANGE SCAN| I_S | 1 | 00:00:01 | -------------------------------------------------------
Top-N with joins: Bad ----------------------------------------------------------- | Id | Operation | Name | Rows | Time | ----------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:07 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:07 | |* 3 | SORT ORDER BY STOPKEY| | 10 | 00:00:07 | |* 4 | HASH JOIN | | 10 | 00:00:07 | |* 5 | INDEX RANGE SCAN | I_C | 506 | 00:00:07 | |* 6 | INDEX RANGE SCAN | I_S | 1 | 00:00:01 | -----------------------------------------------------------
Gotchas? TMI “Too many indexes”
Query conditions WHERE state = 'Florida' WHERE state != 'Florida' where state != ‘FL’ where state=‘FL’ AL AK AZ CO FL MA WA State Population
Watch out for DESC/ASC CREATE INDEX i_s_pop ON cities(state, population); WHERE state >= 'Florida' ORDER BY state, population DESC ) WHERE rownum <= 10 WHERE state >= 'Florida' ORDER BY state, population ) WHERE rownum <= 10 Statistics 107408 consistent gets Statistics 12consistent gets AL AK FL GA HI MA WA … … + SORT NO SORT