520 likes | 539 Vues
Learn how SQL queries are processed in DBMSs through steps like scanning, parsing, validation, conversion to query trees, optimization, and code generation. Discover an example query's journey through these steps and see the benefits of query optimization in reducing intermediate file sizes.
E N D
9. Query Processing SQL Queries in a high level language such as SQL are processed by Horizontal DBMSs in the following steps: 1. SCANand PARSE (SCANNER-PARSER): The Scanner identifies the tokens or language elements. The Parser check for syntax or grammar validity. 2. VALIDATED: The Validator checks for valid names and semantic correctness. 3. CONVERTER converts to an internal representation (usually a QUERY TREE) | 4. QUERY OPTIMIZED: Query Optimzier devises a stategy for executing query (chooses among alternative Query trees). 5. CODE GENERATION: generates code to implement each operator in the selected query plan (the optimizer-selected the query tree). 6. RUNTIME DATABASE PROCESSORING: run plan code Chapter 19 of Fundamentals of Database Systems by Elmasri and Navathe provide an alternative coverage of this material.
_S______________ _C___________ _E______ |S#|SNAME |LCODE | |C#|CNAME|SITE| |S#|C#|GR| |25|CLAY |NJ5101| |8 |DSDE |ND | |32|8 |89| |32|THAISZ|NJ5102| |7 |CUS |ND | |32|7 |91| |38|GOOD |FL6321| |6 |3UA |NJ | |25|7 |68| |17|BAID |NY2091| |5 |3UA |ND .| |25|6 |76| |57|BROWN |NY2092| |32|6 |62| The CONVERTER converts to an internal representation (usually a QUERY TREE). E.g., given the database: The SQL request: SELECT S.SNAME, C.CNAME, E.GR FROM S,C,E WHERE E.GR=68 and C.SITE="ND" and S.LCODE=NJ5101 and C.C#=E.C# and S.S#=E.S#; gets SCANNED, PARSED, VALIDATED, then may get CONVERTED to query tree following the sequencing of the WHERE-clause.
_S______________ _C___________ _E______ |S#|SNAME |LCODE | |C#|CNAME|SITE| |S#|C#|GR| |25|CLAY |NJ5101| |8 |DSDE |ND | |32|8 |89| |32|THAISZ|NJ5102| |7 |CUS |ND | |32|7 |91| |38|GOOD |FL6321| |6 |3UA |NJ | |25|7 |68| |17|BAID |NY2091| |5 |3UA |ND .| |25|6 |76| |57|BROWN |NY2092| |32|6 |62| SELECT S.SNAME, C.CNAME, E.GR FROM S,C,E WHERE E.GR=68 and C.SITE="ND" and S.LCODE=NJ5101 and C.C#=E.C# and S.S#=E.S#; M=PROJ(L)[SNAME,CNAME,GR] | L=SELECT(K.GR=68) | K=SELECT(H.SITE="ND") | H=SELECT(G.LCODE="NJ5101") | G=JOIN(F.C#=C.C#) /\ / \ JOIN(S.S#=E.S#)=F C /\ / \ S E This is simplest CONVERTER (uses the ordering in WHERE clause) CONVERTER
M=PROJ(L)[SNAME,CNAME,GR] | | L=SELECT(K.GR=68) | | K=SELECT(H.SITE="ND") | | H=SELECT(G.LCODE="NJ5101") | | G=JOIN(F.C#=C.C#) /\ / \ JOIN(S.S#=E.S#)=F /\ / \ S E SNAME |CNAME|GR CLAY |CUS |68 S#|SNAME |LCODE |C#|GR|CNAME|SITE 25|CLAY |NJ5101|7 |68|CUS |ND S#|SNAME |LCODE |C#|GR|CNAME|SITE 25|CLAY |NJ5101|7 |68|CUS |ND CONVERTER S#|SNAME |LCODE |C#|GR|CNAME|SITE 25|CLAY |NJ5101|7 |68|CUS |ND 25|CLAY |NJ5101|6 |76|3AU |NJ S#|SNAME |LCODE |C#|GR|CNAME|SITE 25|CLAY |NJ5101|7 |68|CUS |ND 25|CLAY |NJ5101|6 |76|3AU |NJ 32|THAISZ|NJ5102|8 |89|DSDE |ND 32|THAISZ|NJ5102|7 |91|CUS |ND 32|THAISZ|NJ5102|6 |62|3UA |NJ C S#|SNAME |LCODE |C#|GR 25|CLAY |NJ5101|7 |68 25|CLAY |NJ5101|6 |76 32|THAISZ|NJ5102|8 |89 32|THAISZ|NJ5102|7 |91 32|THAISZ|NJ5102|6 |62 C#|CNAME|SITE 8 |DSDE | ND 7 |CUS | ND 6 |3UA | NJ 5 |3UA | ND S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 Let's see the results at each step.
M=PROJ(L)[SNAME,CNAME,GR] | | G=JOIN(F.C#=K.C#) /\ / \ / \ JOIN(H.S#=L.S#)=F \ SNAME |CNAME|GR CLAY |CUS |68 Is the query tree optimal? Is this tree better? The OPTIMIZERdevises a stategy for executing the query (chooses among alternative Query trees). S#|SNAME |LCODE |C#|GR|CNAME|SITE 25|CLAY |NJ5101|7 |68|CUS |ND YES! This tree is better since the intermediate files created are much smaller!! S#|SNAME |LCODE |C#|GR 25|CLAY |NJ5101|7 |68 /\ \ / \ \ / \ \ SEL(S.LCODE=NJ5101)=H L=SEL(E.GR=68) K=SEL(C.SITE=ND) S#|SNAME |LCODE 25|CLAY |NJ5101 S#|C#|GR 25|7 |68 C#|CNAME|SITE 8 |DSDE | ND 7 |CUS | ND 5 |3UA | ND S E C S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 C#|CNAME|SITE 8 |DSDE | ND 7 |CUS | ND 6 |3UA | NJ 5 |3UA | ND
M=PROJ(L)[SNAME,CNAME,GR] | | G=JOIN(F.C#=K.C#) /\ / \ / \ JOIN(H.S#=L.S#)=F \ SNAME |CNAME|GR CLAY |CUS |68 Note that the following could be done: • SITE attribute can be projected from K (doesn't require elimination of duplicates because it is not part of the key). • The LCODE attrib can be projected off of H (doesn't require elimination of duplicates because it is not part of the key). • S# could be projected off of F (it is part of the key but duplicate elimination could be deferred until M since it will have to be done again there anyway - thus this projection can be a "non duplicate-eliminating" projection also (which we will denote by [[ ]]). [[ ]]-projections take no time, whereas duplicate eliminating projections take a lot of time). • C# can be (non-duplicate-eliminating) projected off G (just reordering attrs and eliminating duplicates, if any). SNAME |GR|CNAME CLAY |68|CUS Even better! The intermediate files created are even smaller!! /\ \ / \ \ / \ \ H=SEL(S.LCODE=NJ5101)[[S#,SNAME]] L=SEL(E.GR=68) K=SEL(C.SITE=ND)[[C#,CNAME]] S#|SNAME |C#|GR 25|CLAY |7 |68 S#|SNAME 25|CLAY S#|C#|GR 25|7 |68 C#|CNAME 8 |DSDE 7 |CUS 5 |3UA S E S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 C S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 C#|CNAME|SITE 8 |DSDE | ND 7 |CUS | ND 6 |3UA | NJ 5 |3UA | ND
What have we learned about QP? GOOD RULES? a. Do SELECTS first (push to the bottom of the tree). b. Do attribute elimination part of PROJECT as soon as possible (push down). c. Only do duplicate elimination once (at top-most PROJECT only or in conjunction with a latter join step). QUERY OPTIMIZATION, then, is finding an efficient strategy to implement query requests (Automatically, Heuristically, not necessarily optimally) Note: In lower level languages, the user does the query optimization by writing the procedural code to specify all steps and order those steps. (of course there are optimizing compilers that will automatically alter your "procedures", but still you are mostly responsible for ordering). Relational queries are issued at a high level (SQL or ODBC), so that system has maximal oportunity to optimize them. HEURISTIC RULES are used to re-order query tree. (e.g., RULES a. b. c. above) . Some rules depend upon size and complexity estimates. ESTIMATION estimates the cost of different strategies and chooses the best. Challenge: Get acceptable performance (took 10 years to optimize join process acceptably so that the first viable Relational DBMSs could be successfully sold!).
CODE GENERATION implements the operators above (e.g., SELECT, PROJECT, JOIN...) Some SELECT implementations: (Each of S2 - S6 requires a special access path.) S1. Linear search: sequentially search every record. S2. Binary search: (for selections on a clustered or ordered attribute) S3. Using indexes (or hash structures) for an equality comparison S4. Using primary index for an inequality comparison on a key (clustered). S5. Using aclustering index for "=" comparison S6. Using asecondary B+-tree index for "=", use the index set. SELECTION methods with a WHERE conjunction (AND): S7. Of the many conjunctive attributes, select 1 attribute (usually involving an "=") S8. Intersection of Rrecord Pointers: Intersect RRN-sets then retrieve records S9. If there areBitmapped Indexes, AND bitmaps CASE-1: SELECT is on an attribute with few distinct values. CASE-2: SELECT is on an attribute with uniqueness (key) or near uniqueness. S10. If there is acomposite index on the attributes involved in condition, use it. S11. If there is a composite hash function, use it. SELECTION methods when there is a WHERE disjuntion (OR): S12. If there isno access path (indexes or hash functions), use S1 (brute force). S13. If there areaccess paths, use them and UNION the results. S14. If there areBitMaps, take the OR of the bitmaps.
ENROLL ENROLL 32|89 32|91 32|91 S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 38|6 |98 17|5 |96 RRN|S#|C#|GR 0 |17|5 |96 1 |25|7 |68 2 |25|6 |76 3 |32|8 |89 4 |32|7 |91 5 |34|6 |62 6 |38|6 |98 32|91 Required for selections from an unordered relation with no index or access path. SELECT C#, GR FROM ENROLL WHERE S# = 32; S1. Linear search: sequentially search every record. S2. Binary search: For selections on a clustered (ordered) attribute (in this case, S#): SELECT C#, GR FROM ENROLL WHERE S# = 38; Go half way (to RRN=3), since S# < 38, go half way down what's left (to RRN= 5). Since S# < 38, go half way down what's left (to RRN= 6). Match! Output. Scan aheadand output until no match or EoF.
STUDENT STUDENT Index on S# Index always clustered on the key (here S#) for binary key search. RRN|S#|SNAME | LCODE 0 |25|CLAY |NJ5101 1 |32|THAISZ|NJ5102 2 |38|GOOD |FL6321 3 |17|BAID |NY2091 4 |57|BROWN |NY2092 RID|S#|SNAME | LCODE 1,0|17|BAID |NY2091 1,1|25|CLAY |NJ5101 2,0|32|THAISZ|NJ5102 2,1|38|GOOD |FL6321 3,0|57|BROWN |NY2092 RRN| S# 3 | 17 57| BROWN 32| THAISZ 38| GOOD 32| THAISZ 0 | 25 1 | 32 2 | 38 4 | 57 SELECT C#, NAME FROM STUDENT WHERE S# = 32 S3. Using Indexes: (or hash structures) for an equality comparison. S4. Using primary index for an inequality comparison on a key (clustered). (Find starting point with "=", then retrieve all records beyond that point). SELECT S#,NAME FROM STUDENT WHERE S# 32 nondense Primary Index on S# RID| S# Find starting point (first S# 32) then scan ahead taking all until End 1,0| 17 2,0| 32 3,0| 57
Clustering Index on S# ENROLL=E Find first S#=32, then scan E ahead for others. RRN| S# 0 | 17 RRN|S#|C#|GR 0 |17|5 |96 1 |25|7 |68 2 |25|6 |76 3 |32|8 |89 4 |32|7 |91 5 |32|6 |62 6 |38|6 |98 32| 89 32| 91 CLAY|OUTBK 32| 62 1 | 25 3 | 32 6 | 38 S6. Using asecondary B+-tree index: For "=", use the index set (assuming a B+tree index) SELECT NAME,CITY FROM STUDENT WHERE S# = 25 *32*38* *20* n n32* n *56* n |17 20|25 32|35 38| 56|57 | 2 5|4 1|7 3| 6|0 STUDENT RRN|S#|SNAME |CITY |ST 0 |57|BROWN |NY |NY 1 |32|THAISZ|KNOB |NJ 2 |17|BAID |NY |NY 3 |38|GOOD |GATER|FL 4 |25|CLAY |OUTBK|NJ 5 |20|JOB |MRHD |MN 6 |56|BURGUM|FARGO|ND 7 |35|BOYD |FLAX |NE SELECT C#, GR FROM ENROLL WHERE S# = 32 S5. Using a Clustered Index: for = comparison.
GOOD |GATER BURGUM|FARGO BROWN |NY S6. Using asecondary B+-tree index: For use the index set, then use sequence set (of B+) SELECT NAME,CITY FROM STUDENT WHERE S# 38 *32*38* *20* n n32* n *56* n |17 20|25 32|35 38| 56|57 | 2 5|4 1|7 3| 6|0 STUDENT RRN|S#|SNAME |CITY |ST 0 |57|BROWN |NY |NY 1 |32|THAISZ|KNOB |NJ 2 |17|BAID |NY |NY 3 |38|GOOD |GATER|FL 4 |25|CLAY |OUTBK|NJ 5 |20|JOB |MRHD |MN 6 |56|BURGUM|FARGO|ND 7 |35|BOYD |FLAX |NE
Secondary Index on ST RRN| ST 3 | FL BOYD |FLAX 5 | MN 7 | NE 6 | ND 1,4| NJ 0,2| NY *32*38* *20* n n32* n *56* n |17 20|25 32|35 38| 56|57 | 2 5|4 1|7 3| 6|0 STUDENT RRN|S#|SNAME |CITY |ST 0 |57|BROWN |NY |NY 1 |32|THAISZ|KNOB |NJ 2 |17|BAID |NY |NY 3 |38|GOOD |GATER|FL 4 |25|CLAY |OUTBK|NJ 5 |20|JOB |MRHD |MN 6 |56|BURGUM|FARGO|ND 7 |35|BOYD |FLAX |NE SELECT NAME, CITY FROM STUDENT WHERE S#>25 and ST=NE S7. Of the many conjunctive attributes,select on1 attribute (usually 1 involving an "=") then check the other condition(s) for each retrieved record.
Secondary Index on ST RRN| ST 3 | FL BROWN |NY GOOD |GATER BURGUM|FARGO 5 | MN 7 | NE 6 | ND 1,4| NJ 0,2| NY *32*38* *20* n n32* n *56* n |17 20|25 32|35 38| 56|57 | 2 5|4 1|7 3| 6|0 STUDENT RRN|S#|SNAME |CITY |ST 0 |57|BROWN |NY |NY 1 |32|THAISZ|KNOB |NJ 2 |17|BAID |NY |NY 3 |38|GOOD |GATER|FL 4 |25|CLAY |OUTBK|NJ 5 |20|JOB |MRHD |MN 6 |56|BURGUM|FARGO|ND 7 |35|BOYD |FLAX |NE SELECT NAME, CITY FROM STUDENT WHERE S#>38 and STNE true true true S7. Of the many conjunctive attributes,select on1 attribute (neither involve =! taking S#) then check the other condition(s) for each retrieved record.
S#-RRN-list ST-RRN-list intersection 1,7,3,6,0 0,2,7 0,7 S9. If Bitmap Indexes BMI on ST ST|bit-filter FL| 00010000 MN| 00001000 NE| 00000001 ND| 00000010 NJ| 01001000 NY| 10100000 S#|bit-filter 17| 00100000 20| 00000100 25| 00001000 32| 01000000 OR here to end (S#>25) result: 11010011 35| 00000001 OR NE, NY bitfilters: 10100001 38| 00010000 AND two for result: 10000001 56| 00000010 57| 10000000 STUDENT RRN|S#|SNAME |CITY |ST 0 |57|BROWN |NY |NY 1 |32|THAISZ|KNOB |NJ 2 |17|BAID |NY |NY 3 |38|GOOD |GATER|FL 4 |25|CLAY |OUTBK|NJ 5 |20|JOB |MRHD |MN 6 |56|BURGUM|FARGO|ND 7 |35|BOYD |FLAX |NE S8. INTERSECTION OF RECORD POINTERS: Intersect RRN-sets then retrieve records. SELECT NAME,CITY FROM STUDENT WHERE S#>25 and (ST=NE or ST=NY); (This can be done in conjunction with any of the above methods. If the RRN-sets are stored ahead of time for particular selection criterial, then they can greatly speed up the execution. The question is, which should be generated and stored?).
S8. INTERSECTION OF RECORD POINTERS: ANDing bitmaps, then retrieve records. SELECT NAME,CITY FROM STUDENT WHERE S#>25 and (ST=NE or ST=NY); BitMapped Indexes (BMIs) are used only for "low cardinality" attributes in DataWarehouses. (those with a small domain - ie, only a few possible values. The reason is that for low-cordinality domains (eg, MONTH, STATE, GENDER, etc.), BMI has few entries (rows) and each bitmap is quite dense (many 1-bits To see why this is so, consider two extremes. CASE-1: For a GENDER attribute in a relation with 80,000 tuples. The BMI looks like: GENDER| bit-filter Female| 0111001010100...1 Male | 1000110101011...0 Eaach bitfilter is 80,000 bits or 10KB so the index is ~20KB with only two distinct values (Note the Male entry is unnecessary since it can be calculated from Female bitfilter as the bit-compliment. Thus, the index is only ~10KB in size altogether. If a regular index were used: GENDER|RID-list Female|RID-F1, RID-F2, ..., RID-Fn Male |RID-M1, RID-M2, ..., RID-Mn Each RID takes 8 bytes (maybe more?) The size is ~640KB. Thus BMI size could be as low as ~10KB and the regular index size ~640KB.
S8. INTERSECTION OF RECORD POINTERS: ANDing bitmaps, then retrieve records. SELECT NAME,CITY FROM STUDENT WHERE S#>25 and (ST=NE or ST=NY); BitMapped Indexes (BMIs) CASE-2: SSN attr of employee file for large company (say, with 80,000 employees) BMI: SSN |bit-filter 324-66-9870 |1000000000000...0 ... 687-99-2536 |0000000000000...1 Extant Domain (only those SSN's of existing employees) Each bitfilter 80Kb (10KB) so the index is 80,000 * ~10KB or ~800MB in size. If a regular index were used: SSN |RID 324-66-9870 |RID1 ... 687-99-2536 |RID80000 If RIDs take 8 bytes and SSN+separators take another 12 bytes,the size is ~20*80,000 bits = ~200KB Thus the BMI size could be as low as ~800,000KB and the regular index size would be ~200KB
S10. If there is acomposite index on the attrs involved in condition, use it. If there is a composite hash function, use it Selection implementation is matter of choosing among these alternatives (possibly others?). SELECTION methods when there is a WHERE disjuntion (OR) in the condition If there is no access path (indexes or hash fctns), use S1 (brute force). If there are access paths , use them and UNION the results, or UNION the RID-sets, then get the records (rather than interesection as in the case of AND condition). If there are BitMaps, take the OR of BitMaps, then get records
S E S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 SNAME |C# |GRADE S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 Result S.S#=E.S# FALSE S.S#=E.S# FALSE S.S#=E.S# TRUE S.S#=E.S# TRUE S.S#=E.S# FALSE CODE GENERATION implements the operator, JOIN (equi-join, we will use for it.) J1. NESTED LOOP JOIN S on S.S#=E.S# For each record in S, (the outer loop over records from the outer or driver relation), retrieve every record from E, (the inner loops over the inner relation), test join condition, if it's true, concatenate the tuples (project off unwanted columns) and output, else go to next E (inner-relation) record. SELECT SNAME,C#,GRADE FROM S=STUDENT,E=ENROLL WHERE S.S#=E.S#; CLAY | 7 | 68 CLAY | 6 | 76
S E res S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 SNAME |C# |GRADE S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 S.S#=E.S# TRUE S.S#=E.S TRUE S.S#=E.S# FALSE S.S#=E.S TRUE S.S#=E.S# FALSE CODE GENERATION implements the operator, JOIN (equi-join, we will use for it.) J1. NESTED LOOP Second inner loop pass: SELECT SNAME,C#,GRADE FROM S=STUDENT,E=ENROLL WHERE S.S#=E.S#; CLAY | 7 | 68 CLAY | 6 | 76 THAISZ| 8 | 89 THAISZ| 7 | 91 THAISZ| 6 | 62
S E res S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 SNAME |C# |GRADE S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 S.S#=E.S# FALSE S.S#=E.S FALSE S.S#=E.S FALSE S.S#=E.S FALSE S.S#=E.S# FALSE CODE GENERATION implements the operator, JOIN (equi-join, we will use for it.) J1. NESTED LOOP Third inner loop pass: SELECT SNAME,C#,GRADE FROM S=STUDENT,E=ENROLL WHERE S.S#=E.S#; CLAY | 7 | 68 CLAY | 6 | 76 THAISZ| 8 | 89 THAISZ| 7 | 91 THAISZ| 6 | 62
S E res S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 SNAME |C# |GRADE S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 S.S#=E.S FALSE S.S#=E.S# FALSE S.S#=E.S FALSE S.S#=E.S FALSE S.S#=E.S FALSE CODE GENERATION implements the operator, JOIN (equi-join, we will use for it.) J1. NESTED LOOP 4th inner loop pass: SELECT SNAME,C#,GRADE FROM S=STUDENT,E=ENROLL WHERE S.S#=E.S#; CLAY | 7 | 68 CLAY | 6 | 76 THAISZ| 8 | 89 THAISZ| 7 | 91 THAISZ| 6 | 62
S E res S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 SNAME |C# |GRADE S#|C#|GR 32|8 |89 32|7 |91 25|7 |68 25|6 |76 32|6 |62 S.S#=E.S FALSE S.S#=E.S# FALSE S.S#=E.S FALSE S.S#=E.S FALSE S.S#=E.S FALSE CODE GENERATION implements the operator, JOIN (equi-join, we will use for it.) J1. NESTED LOOP 5th and last inner loop pass: SELECT SNAME,C#,GRADE FROM S=STUDENT,E=ENROLL WHERE S.S#=E.S#; CLAY | 7 | 68 CLAY | 6 | 76 THAISZ| 8 | 89 THAISZ| 7 | 91 THAISZ| 6 | 62
R SNAME |C# |GRADE RRN |S# 2,3 |25 0,1,4|32 Dense Index on E.S# CODE GENERATION implements the operator, JOIN (equi-join, we will use for it.) J2. When there is an Index on one join attribute the join can be done in one pass (called Indexed Nested Loop. If there is an index on E.S#, get r in S, get matching E-tuples using the index (need not scan entire inner relation, E, each time as was necessary with J1) . R=SELECT SNAME,C#,GRADE FROM S=STUDENT,E=ENROLL WHERE S.S#=E.S#; S E S#|SNAME |LCODE 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 17|BAID |NY2091 57|BROWN |NY2092 RRN|S#|C#|GR 0 |32|8 |89 1 |32|7 |91 2 |25|7 |68 3 |25|6 |76 4 |32|6 |62 CLAY | 7 | 68 CLAY | 6 | 76 THAISZ| 8 | 89 THAISZ| 7 | 91 THAISZ| 6 | 62
R SNAME |C# |GRADE S.S#=E.S# TRUE S.S#=E.S# FALSE S.S#=E.S# TRUE S.S#=E.S# TRUE S.S#=E.S# TRUE S.S#=E.S# FALSE S.S#=E.S# TRUE S.S#=E.S# FALSE S.S#=E.S# FALSE CODE GENERATION implements the operator, JOIN (equi-join, we will use for it.) J3. MERGE JOIN: If both S.S# and E.S# are clustered, then scan both S and E once in order, keeping in mind that S.S# is the primary key (uniqueness property), but E.S# is not. R=SELECT SNAME,C#,GRADE FROM S=STUDENT,E=ENROLL WHERE S.S#=E.S#; S E S#|SNAME |LCODE 17|BAID |NY2091 25|CLAY |NJ5101 32|THAISZ|NJ5102 38|GOOD |FL6321 57|BROWN |NY2092 S#|C#|GR 25|7 |68 25|6 |76 32|6 |62 32|8 |89 32|7 |91 CLAY | 7 | 68 CLAY | 6 | 76 THAISZ| 6 | 62 THAISZ| 8 | 89 THAISZ| 7 | 91 J3'. SORT-MERGE JOIN: If R.A and S.B are not ordered, sort them first (into R' clustered on A and S' clustered on B), then apply MERGE (J2 above).
CODE GENERATION implements the operator, JOIN J4. HASH-JOIN: RIDs hashed to buckets (pages). Corresponding buckets retrieved and scanned GRACE JOIN: (first example of a hash-join technique): Allocate M pages of memory to the join process. Partition the M page frames as follows: One page frame for new pages as they are read from disk (called that page frame, IN), then the remaining B+1 become hash buckets R0,..,RB ( Note: B = M-2 ). Use hash function, h(S#)=MODB(S#) Partial Sort (Build) Phase: Partially-Sort-S: Read each S-page to IN, hash each record using h(S#) to R0,...,RB. If the bucket, say Ri, is full (collision), flush it to temporary disk file named Si. Partially-Sort-E: Read each E-page into IN, hash each record with h(S#). Upon collision in a bucket, say Rj, flush its' contents to a temporary disk file, named Ej. Probe Phase: With each pair of temporary files, S0 & E0, S1 & E2, S2 & E2, ... in turn, do as follows: Re-partition memory with two page frames called IN and OUT. With the rest, form one large hash area. For Si, construct an internal a hash table in the hash area using another hash function, k(S#) FOR Ei, PROBE the hash table using k(S#) for matches, output the join of all matches to OUT.
RID|S#|SNAME | LCODE R0 R1 R2 IN CODE GENERATION implements the operator, JOIN J4. HASH-JOIN: RIDs hashed to buckets (pages). Matching buckets are retrieved and scanned. R=SELECT SNAME,C#,GRADE FROM S=STUDENT,E=ENROLL WHERE S.S#=E.S#; GRACE JOIN:(first example of a hash-join technique): Allocate M=4 pages of memory to the join process. Partition the M pages as follows: - One page for IN (putting new pages as they are read from disk), M-1=3 pages for buckets R0,..,R2. Build Phase. Use hash function, h(S#)=MOD3(S#) S 1,0|17|BAID |NY2091 S0 57|BROWN |NY2092 1,1|25|CLAY |NJ5101 2,0|32|THAISZ|NJ5102 2,1|57|BROWN |NY2092 S1 3,0|38|GOOD |FL6321 25|CLAY |NJ5101 2,0|32|THAISZ|NJ5102 3,0|38|GOOD |FL6321 1,0|17|BAID |NY2091 S2 2,1|57|BROWN |NY2092 1,1|25|CLAY |NJ5101 17|BAID |NY2091 E Collision! Dump R2 Then flush all. 32|THAISZ|NJ5102 38|GOOD |FL6321 RID|S#|C#|GR| LCODE Do the same with E. 1,0|17|5 |96|NJ5101 E0 Partial-Sort-S: Read each S-page to IN, hash each record using h(S#) to R0,R1,R2. Upon collision in any of the buckets, flush to temporary disk file, called S0,S1,S2. 1,1|25|7 |68|ND4456 2,0|25|6 |76|NY2091 E1 25|7 |68|ND4456 2,1|32|8 |89|NY2091 3,0|32|7 |91|FL6320 25|6 |76|NY2091 3,1|34|6 |62|ND4456 34|6 |62|ND4456 E2 17|5 |96|NJ5101 32|8 |89|NY2091 32|7 |91|FL6320
OUT 0 Hash 1 2 3 IN CODE GENERATION implements the operator, GRACE JOIN Hash Phase: With each pair of temporary files, S0 & E0, S1 & E2, S2 & E2,... in turn, do as follows: Re-partition memory into IN, OUT and one large hash area. Probe Phase: For Si, BUILD internal a hash table in the hash area using another hash function, k(S#)=MOD4 (open addr for collisions) FOR Ei, PROBE hash table using k for matches, output join of matches to OUT. Start with S0 and E0. But E0 empty (no output will be produced) so skip. PROBE S1 and E1:1.Read S1 page-1 to IN. 2. Hash IN to HASH. S0 57|BROWN |NY2092 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 E1 25|7 |68|ND4456 25|CLAY |NJ5101 25|6 |76|NY2091 34|6 |62|ND4456 E2 17|5 |96|NJ5101 32|8 |89|NY2091 25|CLAY |NJ5101 32|7 |91|FL6320
OUT 0 Hash 1 2 3 IN CODE GENERATION implements GRACE JOIN PROBE S1, E1: 1. Read S1 page-1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. S0 57|BROWN |NY2092 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 CLAY|6 |76 E1 25|7 |68|ND4456 25|CLAY |NJ5101 25|6 |76|NY2091 34|6 |62|ND4456 E2 17|5 |96|NJ5101 32|8 |89|NY2091 25|7 |68|ND4456 32|7 |91|FL6320 25|6 |76|NY2091
OUT 0 Hash 1 2 3 IN CODE GENERATION implements GRACE JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT (Since OUT is full, flush OUT first.). But no match! 8. Flush HASH and IN when done with S1, E1 Probe (before starting Probe S2, E2). S0 57|BROWN |NY2092 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 CLAY|6 |76 E1 25|7 |68|ND4456 25|CLAY |NJ5101 25|6 |76|NY2091 34|6 |62|ND4456 E2 17|5 |96|NJ5101 32|8 |89|NY2091 34|6 |62|ND4456 32|7 |91|FL6320
OUT 0 Hash 1 2 3 IN CODE GENERATION implements GRACE JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). S0 57|BROWN |NY2092 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 CLAY|6 |76 E1 25|7 |68|ND4456 25|6 |76|NY2091 34|6 |62|ND4456 E2 17|5 |96|NJ5101 17|BAID |NY2091 32|8 |89|NY2091 32|7 |91|FL6320 32|THAISZ|NJ5102
OUT 0 Hash 1 2 3 IN CODE GENERATION implements GRACE JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). 3. Read S2 pg2 to IN 4. MOD3 hash IN to HASH (open addressing for collisions). S0 57|BROWN |NY2092 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 CLAY|6 |76 E1 25|7 |68|ND4456 25|6 |76|NY2091 17|BAID |NY2091 34|6 |62|ND4456 E2 32|THAISZ|NJ5102 17|5 |96|NJ5101 32|8 |89|NY2091 38|GOOD |FL6321 32|7 |91|FL6320
OUT 0 Hash 1 2 3 IN CODE GENERATION implements GRACE JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). 3. Read S2 pg2 to IN 4. MOD3 hash IN to HASH (open addressing for collisions). 5.Read E2 to IN. 6.Hash to HASH. 7. If match, Concatenate to OUT... S0 57|BROWN |NY2092 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 BAID |5|96 CLAY|6 |76 38|GOOD |FL6321 E1 25|7 |68|ND4456 25|6 |76|NY2091 17|BAID |NY2091 34|6 |62|ND4456 E2 32|THAISZ|NJ5102 17|5 |96|NJ5101 32|8 |89|NY2091 17|5 |96|NJ5101 32|7 |91|FL6320 32|8 |89|NY2091
OUT 0 Hash 1 2 3 IN CODE GENERATION implements GRACE JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). 3. Read S2 pg2 to IN 4. MOD3 hash IN to HASH (open addressing for collisions). 5.Read E2 to IN. 6.Hash to HASH. 7. If match, Concatenate to OUT... S0 57|BROWN |NY2092 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 BAID |5|96 THAISZ|8|89 CLAY|6 |76 38|GOOD |FL6321 E1 25|7 |68|ND4456 25|6 |76|NY2091 17|BAID |NY2091 34|6 |62|ND4456 E2 32|THAISZ|NJ5102 17|5 |96|NJ5101 32|8 |89|NY2091 32|7 |91|FL6320 32|8 |89|NY2091
OUT 0 Hash 1 2 3 IN CODE GENERATION implements GRACE JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). 3. Read S2 pg2 to IN 4. MOD3 hash IN to HASH (open addressing for collisions). 5.Read E2 to IN. 6.Hash to HASH. 7. If match, Concatenate to OUT...(repeat until E2 empty) 8. Flush HASH and IN when done Probing E1. S0 57|BROWN |NY2092 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY |7|68 THAISZ|7|91 BAID |5|96 THAISZ|8|89 CLAY |6|76 38|GOOD |FL6321 E1 25|7 |68|ND4456 25|6 |76|NY2091 17|BAID |NY2091 34|6 |62|ND4456 E2 32|THAISZ|NJ5102 17|5 |96|NJ5101 32|8 |89|NY2091 32|7 |91|FL6320 32|7 |91|FL6320
CODE GENERATION implements the operator, HYBRID HASH JOIN J4. HASH-JOIN: (a better way than GRACE JOIN): HYBRID HASH JOINof SS#E (developed by former chair of NDSU CS Dept., Dr. L. Shapiro): Partition the M pages of main memory allocated to the join process as follows: One page for the IN, 0ne page for the OUT, B pages for hash buckets, R1..RB, Leave the rest for a large internal hash table, R0. PARTIAL SORT S: Read each S page to IN, hash each record using h(S#)=MODB(S#) to R0..RB. If a record hashes to R0, apply an internal hash function, k(S#) which hashes the record to a slot in R0. Use open addressing for k-collisions. When an h-collision occurs in any page Ri i=1..B, flush that page to a disk file, called Si i=1..B. PARTIALLY SORT E: Read each E page into IN, hash each record with h(S#) to R0..RB If record hashes to R0, apply internal hash function k(S#), and concatenate records that match to OUT. If a collision occurs in any page, Ri i=1..B, flush to temporary disk file, Ei i=1..B. PROBE PHASE for pairs, Si and Ei i=1..B, do the same as in Grace Join. Hybrid Hash Join can be done with Bit Filtering to eliminate non-participating tuples early and avoid wasted processing of non-participating tuples Much more detail and example walkthroughs can be found in the HTML version of these notes (also available from "Other Materials" http://www.cs.ndsu.nodak.edu/~perrizo/classes/765/09query.html
RID|S#|SNAME | LCODE 0 1 2 3 CODE GENERATION implements the operator, HYBRID HASH JOIN Read each S-page to IN, hash each record using h(S#)=MOD3(S#) to R0,R1,R2. If h(S#)=0, k(S#) determines R0 slot (open addressing for collisions in R0). h collisions in R1, R2 flush to file, S1, S2 resp. Similarly (note all h hashes goes to R1 or R2, So all E records flush to E1 and E2 and R0 is flushed too. S E h=MOD3 k=MOD4 SELECT SNAME,C#,GRADE FROM S,E WHERE S.S#=E.S#; RID|S#|C#|GR| LCODE 1,0|17|5 |96|NJ5101 1,0|17|BAID |NY2091 S0 1,1|25|7 |68|ND4456 1,1|25|CLAY |NJ5101 2,0|25|6 |76|NY2091 2,0|32|THAISZ|NJ5102 2,1|57|BROWN |NY2092 2,1|32|8 |89|NY2091 S1 3,0|32|7 |91|FL6320 3,0|38|GOOD |FL6321 25|CLAY |NJ5101 3,1|34|6 |62|ND4456 S2 17|BAID |NY2091 OUT 32|THAISZ|NJ5102 38|GOOD |FL6321 R0 57|BROWN |NY2092 E0 R1 E1 25|7 |68|ND4456 R2 25|6 |76|NY2091 34|6 |62|ND4456 E2 IN 17|5 |96|NJ5101 2,0|32|THAISZ|NJ5102 3,0|38|GOOD |FL6321 1,0|17|BAID |NY2091 32|8 |89|NY2091 2,1|57|BROWN |NY2092 1,1|25|CLAY |NJ5101 32|7 |91|FL6320
OUT 0 Hash 1 2 3 IN CODE GENERATION implements HYBRID JOIN PROBE S1, E1: 1. Read S1 page-1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. S0 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 CLAY|6 |76 E1 25|7 |68|ND4456 25|CLAY |NJ5101 25|6 |76|NY2091 34|6 |62|ND4456 E2 17|5 |96|NJ5101 32|8 |89|NY2091 25|7 |68|ND4456 32|7 |91|FL6320 25|6 |76|NY2091
OUT 0 Hash 1 2 3 IN CODE GENERATION implements HYBRID JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT (Since OUT is full, flush OUT first.). But no match! 8. Flush HASH and IN when done with S1, E1 Probe (before starting Probe S2, E2). S0 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 CLAY|6 |76 E1 25|7 |68|ND4456 25|CLAY |NJ5101 25|6 |76|NY2091 34|6 |62|ND4456 E2 17|5 |96|NJ5101 32|8 |89|NY2091 34|6 |62|ND4456 32|7 |91|FL6320
OUT 0 Hash 1 2 3 IN CODE GENERATION implements HYBRID JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). S0 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 CLAY|6 |76 E1 25|7 |68|ND4456 25|6 |76|NY2091 34|6 |62|ND4456 E2 17|5 |96|NJ5101 17|BAID |NY2091 32|8 |89|NY2091 32|7 |91|FL6320 32|THAISZ|NJ5102
OUT 0 Hash 1 2 3 IN CODE GENERATION implements HYBRID JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). 3. Read S2 pg2 to IN 4. MOD3 hash IN to HASH (open addressing for collisions). S0 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 CLAY|6 |76 E1 25|7 |68|ND4456 25|6 |76|NY2091 17|BAID |NY2091 34|6 |62|ND4456 E2 32|THAISZ|NJ5102 17|5 |96|NJ5101 32|8 |89|NY2091 38|GOOD |FL6321 32|7 |91|FL6320
OUT 0 Hash 1 2 3 IN CODE GENERATION implements HYBRID JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). 3. Read S2 pg2 to IN 4. MOD3 hash IN to HASH (open addressing for collisions). 5.Read E2 to IN. 6.Hash to HASH. 7. If match, Concatenate to OUT... S0 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 BAID |5|96 CLAY|6 |76 38|GOOD |FL6321 E1 25|7 |68|ND4456 25|6 |76|NY2091 17|BAID |NY2091 34|6 |62|ND4456 E2 32|THAISZ|NJ5102 17|5 |96|NJ5101 32|8 |89|NY2091 17|5 |96|NJ5101 32|7 |91|FL6320 32|8 |89|NY2091
OUT 0 Hash 1 2 3 IN CODE GENERATION implements HYBRID JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). 3. Read S2 pg2 to IN 4. MOD3 hash IN to HASH (open addressing for collisions). 5.Read E2 to IN. 6.Hash to HASH. 7. If match, Concatenate to OUT... S0 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY|7 |68 BAID |5|96 THAISZ|8|89 CLAY|6 |76 38|GOOD |FL6321 E1 25|7 |68|ND4456 25|6 |76|NY2091 17|BAID |NY2091 34|6 |62|ND4456 E2 32|THAISZ|NJ5102 17|5 |96|NJ5101 32|8 |89|NY2091 32|7 |91|FL6320 32|8 |89|NY2091
OUT 0 Hash 1 2 3 IN CODE GENERATION implements HYBRID JOIN PROBE S1, E1: 1. Read S1 pg1 to IN. 2. Hash IN to HASH. 3. Read E1 to IN 4. Concatenate to OUT. 5. Read (next page of) E1 to IN. 6. Hash to HASH. 7. If match, Concatenate to OUT... 8. Flush HASH and IN when done Probing E1. PROBE S2, E2:1. Read S2 pg1 to IN 2. MOD3 hash IN to HASH (open addressing for collisions). 3. Read S2 pg2 to IN 4. MOD3 hash IN to HASH (open addressing for collisions). 5.Read E2 to IN. 6.Hash to HASH. 7. If match, Concatenate to OUT...(repeat until E2 empty) 8. Flush HASH and IN when done Probing E1. S0 S1 25|CLAY |NJ5101 S2 17|BAID |NY2091 32|THAISZ|NJ5102 38|GOOD |FL6321 E0 CLAY |7|68 THAISZ|7|91 BAID |5|96 THAISZ|8|89 CLAY |6|76 38|GOOD |FL6321 E1 25|7 |68|ND4456 25|6 |76|NY2091 17|BAID |NY2091 34|6 |62|ND4456 E2 32|THAISZ|NJ5102 17|5 |96|NJ5101 32|8 |89|NY2091 32|7 |91|FL6320 32|7 |91|FL6320
RID|S#|SNAME | LCODE 0 R2 1 R1 2 3 R0 CODE generation of HYBRID HASH JOINusing R2 as internal hash bucket! BUILD S: Read each S-page to IN, hash each record using h(S#)=MOD3(S#) to R0,R1,R2. If h(S#)=2, k(S#) determines R2 slot (open addressing for R2). h collisions in R1, R0 flush to file, S1, S0 resp. S E RID|S#|C#|GR| LCODE h=MOD3 k=MOD4 SELECT SNAME,C#,GRADE FROM S,E WHERE S.S#=E.S#; 1,0|17|5 |96|NJ5101 1,0|17|BAID |NY2091 1,1|25|7 |68|ND4456 1,1|25|CLAY |NJ5101 2,0|25|6 |76|NY2091 2,0|32|THAISZ|NJ5102 S0 2,1|57|BROWN |NY2092 2,1|32|8 |89|NY2091 57|BROWN |NY2092 3,0|32|7 |91|FL6320 3,0|38|GOOD |FL6321 3,1|34|6 |62|ND4456 S1 25|CLAY |NJ5101 OUT E0 E1 IN 2,0|32|THAISZ|NJ5102 1,0|17|BAID |NY2091 3,0|38|GOOD |FL6321 2,1|57|BROWN |NY2092 1,1|25|CLAY |NJ5101
RID|S#|SNAME | LCODE 0 R2 1 R1 2 3 R0 CODE generation implements HYBRID HASH JOINusing R2 as internal hash bucket! BUILD E: Read first E-page to IN, hash each record using h(S#)=MOD3(S#) to R0,R1,R2. If h(S#)=2, k(S#) determines R2 slot (open addressing for R2). If match, concatenate to OUT. h collisions in R1, R0 flush to file, E1, E0 respectively. S E RID|S#|C#|GR| LCODE h=MOD3 k=MOD4 SELECT SNAME,C#,GRADE FROM S,E WHERE S.S#=E.S#; 1,0|17|5 |96|NJ5101 1,0|17|BAID |NY2091 1,1|25|7 |68|ND4456 1,1|25|CLAY |NJ5101 2,0|25|6 |76|NY2091 2,0|32|THAISZ|NJ5102 S0 2,1|57|BROWN |NY2092 2,1|32|8 |89|NY2091 57|BROWN |NY2092 3,0|32|7 |91|FL6320 3,0|38|GOOD |FL6321 3,1|34|6 |62|ND4456 S1 S1 25|CLAY |NJ5101 S2 BAID |5 |96 OUT E0 32|THAISZ|NJ5102 E1 17|BAID |NY2091 38|GOOD |FL6321 IN 1,0|17|5 |96|NJ5101 1,1|25|7 |68|ND4456
RID|S#|SNAME | LCODE 0 R2 1 R1 2 3 R0 CODE GEN implements HYBRID HASH JOINusing R2 as internal hash bucket! BUILD E: Read second E-page to IN, hash each record using h(S#)=MOD3(S#) to R0,R1,R2. If h(S#)=2, k(S#) determines R2 slot (open addressing for R2). If match, concatenate to OUT. h collisions in R1, R0 flush to file, E1, E0 respectively. S E RID|S#|C#|GR| LCODE h=MOD3 k=MOD4 SELECT SNAME,C#,GRADE FROM S,E WHERE S.S#=E.S#; 1,0|17|5 |96|NJ5101 1,0|17|BAID |NY2091 1,1|25|7 |68|ND4456 1,1|25|CLAY |NJ5101 2,0|25|6 |76|NY2091 2,0|32|THAISZ|NJ5102 S0 2,1|57|BROWN |NY2092 2,1|32|8 |89|NY2091 57|BROWN |NY2092 3,0|32|7 |91|FL6320 3,0|38|GOOD |FL6321 3,1|34|6 |62|ND4456 S1 S1 25|CLAY |NJ5101 S2 BAID |5 |96 OUT THAISZ|8 |89 25|7 |68|ND4456 E0 32|THAISZ|NJ5102 E1 17|BAID |NY2091 38|GOOD |FL6321 IN 2,0|25|6 |76|NY2091 2,1|32|8 |89|NY2091
RID|S#|SNAME | LCODE 0 R2 1 R1 2 3 R0 CODE GEN implements HYBRID HASH JOINusing R2 as internal hash bucket! BUILD E: Read third E-page to IN, hash each record using h(S#)=MOD3(S#) to R0,R1,R2. If h(S#)=2, k(S#) determines R2 slot (open addressing for R2). If match, concatenate to OUT. h collisions in R1, R0 flush to file, E1, E0 respectively. When done building E, flush R2. S E RID|S#|C#|GR| LCODE h=MOD3 k=MOD4 SELECT SNAME,C#,GRADE FROM S,E WHERE S.S#=E.S#; 1,0|17|5 |96|NJ5101 1,0|17|BAID |NY2091 1,1|25|7 |68|ND4456 1,1|25|CLAY |NJ5101 2,0|25|6 |76|NY2091 2,0|32|THAISZ|NJ5102 S0 2,1|57|BROWN |NY2092 2,1|32|8 |89|NY2091 57|BROWN |NY2092 3,0|32|7 |91|FL6320 3,0|38|GOOD |FL6321 3,1|34|6 |62|ND4456 S1 S1 25|CLAY |NJ5101 S2 BAID |5 |96 OUT THAISZ|7 |91 THAISZ|8 |89 25|7 |68|ND4456 E0 25|6 |76|NY2091 32|THAISZ|NJ5102 E1 17|BAID |NY2091 25|7 |68|ND4456 38|GOOD |FL6321 25|6 |76|NY2091 34|6 |62|ND4456 IN 3,0|32|7 |91|FL6320 3,1|34|6 |62|ND4456
OUT 0 1 2 3 R2 4 5 6 7 IN CODE GEN implements HYBRID HASH JOIN using R2 as internal hash bucket probe: h=MOD3 k=MOD4 SELECT SNAME,C#,GRADE FROM S,E WHERE S.S#=E.S#; PROBE S: Notice how much more efficient the probe phase of HH JOIN is than Grace JOIN when the internal Hash table is chosen to apply to the right bucket! (And how it may not be faster, if that decision is badly made!) Note: If memory allocation is static, use all Ri pages for internal hash function! So k=MOD8 S0 57|BROWN |NY2092 S1 S1 25|CLAY |NJ5101 25|CLAY |NJ5101 S2 BAID |5 |96 THAISZ|6 |76 THAISZ|7 |91 THAISZ|8 |89 CLAY |7 |68 E0 E1 25|7 |68|ND4456 25|6 |76|NY2091 34|6 |62|ND4456 25|7 |68|ND4456 25|6 |76|NY2091
CODE GENERATION implementing the operator, PROJECTION Projection is removal of certian specified attributes (columns) from a relation. Given a relation, R(A,B,C,D,E), the projection of R onto A,B,D written, PROJR[A,B,D], is done by removing columns C and E and then eliminating any duplicated tuples from the result. {A,B,D} is called the projection attribute-list. Note: many system provide a more flexible projection (in which duplicates are not removed). Strictly speaking, this is not a relational operator since the result is not a relation (the result is what mathematicians term, a "bag"). If the attribute-list contains a key, then there are no duplicates to be removed (why not?). In this case, the projection implementation code gets each tuple in turn, trims off the unspecified attributes and outputs the result. If list does not contain a key, sort (or hash) and then get a record, trim off the non-attribute-list attributes and eliminate duplicates. Note: this can be expensive (about as expensive as a join). Projection codes (with duplicate elimination) are similar to join codes. e.g., methods include: Nested loop: For each tuple, scan the projection for duplicates. Since there is a physical order to the tuples (even though it may not be any particular logical ordering) consider the tuples in that order and then scan from that tuple only (not from the beginning of the file). Indexed nested loop: For each tuple, consult index for duplicates and remove them. Sort-remove (like sort-merge), Sort result, scan once for duplicates (now situated adjacent to each other) and remove them. Hash methods. Partially sort projection by hashing (similar to grace join??) then use reduced nested loop on each subset (one at a time) to discard duplicates.