400 likes | 526 Vues
SQL Top-N and Pagination Pattern. Maxym Kharchenko. What is top-N. Give me the top 10 salaries in the “Sales” dept Give me the top 10 best selling books Give me the 10 latest orders. What is top-N. Setup. SQL> @ desc cities Name Null? Type
E N D
SQL Top-Nand Pagination Pattern Maxym Kharchenko
What is top-N • Give me the top 10 salaries in the “Sales” dept • Give me the top 10 best selling books • Give me the 10 latest orders
Setup SQL> @desc cities Name Null? Type -------------------------- -------- --------------- NAME NOT NULL VARCHAR2(100) STATE NOT NULL VARCHAR2(100) POPULATION NOT NULL NUMBER PCTFREE 99 PCTUSED 1 http://www.census.gov
Naïve Top-N Give me the top 5 cities by population SELECT name, population FROM cities WHERE rownum <= 5 ORDER BY population DESC; NAME Pop ---------------------- ------ Robertsdale city 5,276 Glen Allen town (pt.) 458 Boligee town 328 Riverview town 184 Altoona town (pt.) 30 Statistics 7 consistent gets
Naïve Top-N explained ----------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Time | ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 110 | 00:00:01 | | 1 | SORT ORDER BY | | 5 | 110 | 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | 3 | TABLE ACCESS FULL| CITIES | 10 | 220 | 00:00:01 | -----------------------------------------------------------------
Correct top-N query SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC ) WHERE rownum <= 5; SELECT name, population FROM cities ORDER BY population DESC FETCH FIRST 5 ROWS ONLY; <= 11g >= 12c
Correct top-N query: Execution SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC ) WHERE rownum <= 5; NAME Pop -------------------- ---------- Los Angeles city 3,792,621 Chicago city (pt.) 2,695,598 Chicago city (pt.) 2,695,598 Chicago city 2,695,598 New York city (pt.) 2,504,700 Statistics 56024 consistent gets
Reading, filtering and sorting --------------------------------------------------------------------- | Id | Operation | Name | Rows |TempSpc| Time | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | | 00:01:58 | |* 1 | COUNT STOPKEY | | | | | | 2 | VIEW | | 56072 | | 00:01:58 | |* 3 | SORT ORDER BY STOPKEY| | 56072 | 1768K| 00:01:58 | | 4 | TABLE ACCESS FULL | CITIES | 56072 | | 00:01:54 | ---------------------------------------------------------------------
Proper data structure Ordered By: Population CREATE INDEX i_pop ON cities(population); -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 56072 | 00:00:01 | | 4 | INDEX RANGE SCAN DESCENDING| I_POP | 10 | 00:00:01 | -------------------------------------------------------------------- Statistics 12 consistent gets
Why index works Ordered By: Population • Colocation • Can stop after reading N rows • No Sort CREATE INDEX i_pop ON cities(population);
More elaborate top-N Give me the top 5 cities by population in Florida SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 5; NAME Pop -------------------- ---------- Jacksonville city 821,784 Miami city 399,457 Tampa city 335,709 St. Petersburg city 244,769 Orlando city 238,300 Statistics 264 consistent gets
Uncertain nature of filtering Ordered By: Population WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 5; WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 200; Statistics 264consistent gets Statistics 19747consistent gets
Multi column indexes CREATE INDEX i_state_pop ON cities(state, population); where state=‘FL’ AL AK AZ CO FL MA WA State Population • Ordered By: • State • State+Population • Not Ordered by: Population WHERE state=‘FL’ NOW: Ordered By: Population
Multicolumn indexes ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 1099 | 00:00:01 | |* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP | 11 | 00:00:01 | ------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter(ROWNUM<=5) 4 - access("STATE"='Florida') Statistics 12 consistent gets
Trips to the table ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 1099 | 00:00:01 | |* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP | 11 | 00:00:01 | ------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter(ROWNUM<=5) 4 - access("STATE"='Florida') Statistics 12 consistent gets
Index range scan: cost math Window: 500 records 4-5 logical reads ~ 5-10 logical reads ~ 10-500 logical reads
Covering index CREATE INDEX i_state_pop ON cities(state, population); CREATE INDEX i_state_pop_c ON cities(state, population, name); Statistics 12 consistent gets Statistics 7 consistent gets -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I_STATE_POP_C | 506 | 00:00:01 | --------------------------------------------------------------------------
Ideal top-N • Use the index • Make the best index • And read only from the index
Less than ideal top-N • Effect of query conditions • Effect of deletes and updates • Technicalities
Condition better! WHERE active != 'N' ORDER BY order_date DESC ) WHERE rownum <= 10; CREATE TABLE orders ( … active char(1) NOT NULL CHECK (active IN ('Y', 'N')) WHERE active = 'Y' ORDER BY order_date DESC ) WHERE rownum <= 10; Statistics 12345 consistent gets Statistics 10 consistent gets
Trade WHERE for ORDER BY CREATE INDEX t_idx ON t(a, b, c); SELECT * FROM (SELECT * FROM t WHERE a=12 ORDER BY c) ) WHERE rownum <= 10; Statistics 1200 consistent gets WHERE a=12 ORDER BY c Statistics 12 consistent gets WHERE a=12 ORDER BY b, c Statistics 12 consistent gets WHERE a=12 AND b=0 ORDER BY c
Tolerate filtering SELECT * FROM ( SELECT name, population FROM cities WHERE state != 'Florida' ORDER BY population DESC ) WHERE rownum <= 10; Statistics 28 consistent gets
Tolerate filtering -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | |* 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 55566 | 00:00:01 | | 4 | INDEX RANGE SCAN DESCENDING| I_POP | 12 | 00:00:01 | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=10) 3 - filter("STATE"<>'Florida')
Updates and Deletes SQL> @desc cities2 Name Null? Type ---------------------- -------- ---------------- NAME NOT NULL VARCHAR2(100) STATE NOT NULL VARCHAR2(100) POPULATION NOT NULL NUMBER BUDGET_SURPLUS NOT NULL VARCHAR2(1) CREATE INDEX i2_pop ON cities2(budget_surplus, population, name);
Updates and Deletes SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' ORDER BY population DESC ) WHERE rownum <= 5; ------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 12 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 56067 | 00:00:01 | ------------------------------------------------------------------- Statistics 7 consistent gets
Updates and Deletes UPDATE cities2 SET budget_surplus='N' WHERE rowid IN ( SELECT * FROM ( SELECT rowid FROM cities2 ORDER BY population DESC ) WHERE rownum <= 200); ------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 12 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 56067 | 00:00:01 | ------------------------------------------------------------------- Statistics 207 consistent gets
Updates and Deletes ALTER TABLE cities2 ADD (version number default 0 NOT NULL); CREATE INDEX i2_vpop ON cities2(budget_surplus, version, population); UPDATE cities2 SET version=1WHERE budget_surplus='Y' AND version=0; Budget_surplus Y Y Budget_surplus Version 0 Y 1 Population
Updates and Deletes SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' AND version=1 ORDER BY population DESC ) WHERE rownum <= 5; -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 1 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_VPOP | 1 | 00:00:01 | -------------------------------------------------------------------- Statistics 9 consistent gets
Pagination SELECT * FROM ( SELECT * FROM ( SELECT name, population, rownum AS rn FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 20 ) WHERE rn > 10; SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 10;
Dumb Pagination ) WHERE rownum <= 20 ) WHERE rn > 10; Statistics 22 consistent gets ) WHERE rownum <= 30 ) WHERE rn > 20; Statistics 32 consistent gets
Smart pagination SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' AND population < 154750 ORDER BY population DESC ) WHERE rownum <= 10; SELECT * FROM ( SELECT * FROM ( SELECT name, population, rownum AS rn FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 20 ) WHERE rn > 10; Statistics 22 consistent gets Statistics 12 consistent gets
Top-N with joins SELECT * FROM ( SELECT c.name as city, c.population, s.capital FROM cities c, states s WHERE c.state_id = s.id AND c.state='Florida' ORDER BY c.population DESC ) WHERE rownum <= 5 / Driving table: Filter state Order By population Join state_id Select name Joined to table: Join id Select capital Use Nested Loops! Build indexes like this!
Top-N with joins: Good ------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:13 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:13 | | 3 | NESTED LOOPS | | 10 | 00:00:13 | |* 4 | INDEX RANGE SCAN| I_C | 506 | 00:00:07 | |* 5 | INDEX RANGE SCAN| I_S | 1 | 00:00:01 | -------------------------------------------------------
Top-N with joins: Bad ----------------------------------------------------------- | Id | Operation | Name | Rows | Time | ----------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:07 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:07 | |* 3 | SORT ORDER BY STOPKEY| | 10 | 00:00:07 | |* 4 | HASH JOIN | | 10 | 00:00:07 | |* 5 | INDEX RANGE SCAN | I_C | 506 | 00:00:07 | |* 6 | INDEX RANGE SCAN | I_S | 1 | 00:00:01 | -----------------------------------------------------------
Gotchas? TMI “Too many indexes”
Query conditions WHERE state = 'Florida' WHERE state != 'Florida' where state != ‘FL’ where state=‘FL’ AL AK AZ CO FL MA WA State Population
Watch out for DESC/ASC CREATE INDEX i_s_pop ON cities(state, population); WHERE state >= 'Florida' ORDER BY state, population DESC ) WHERE rownum <= 10 WHERE state >= 'Florida' ORDER BY state, population ) WHERE rownum <= 10 Statistics 107408 consistent gets Statistics 12consistent gets AL AK FL GA HI MA WA … … + SORT NO SORT