Advanced Topics: Indexes & Transactions
370 likes | 592 Vues
Advanced Topics: Indexes & Transactions. Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu. Indexes. Why Indexes. With or without indexes, the query answer should be the same Indexes are needed for efficiency and fast access of data. Without index, we check all 10,000 students.
Advanced Topics: Indexes & Transactions
E N D
Presentation Transcript
Advanced Topics: Indexes & Transactions Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu cs3431
Indexes cs3431
Why Indexes • With or without indexes, the query answer should be the same • Indexes are needed for efficiency and fast access of data Without index, we check all 10,000 students Assume we have 10,000 students SELECT* FROMStudent WHEREsNumber = 76544357; With index, we can reach that student directly cs3431
Direct Access vs. Sequential Access SELECT* FROMStudent WHEREsNumber = 76544357; Without index, we check all 10,000 students (sequential access) With index, we can reach that student directly (direct access) cs3431
What is an Index • A index is an auxiliary file that makes it more efficient to search for a record in the data file • The index is usually specified on one field of the file • Although it could be specified on several fields • The index is stored separately from the base table • Each table may have multiple indexes Can create an index on sNumber Student Can create a second index on sName cs3431
Example: Index on sNumber Student • Index file is always sorted • Index size is much smaller than the table size • Now any query (equality or range) on sNumber can be efficiently answered (Binary search on the index) Index on sNumber
Example: Index on sName Student • Duplicates values have duplicate entries in the index • Now any query (equality or range) on sName can be efficiently answered (Binary search on the index) Index on sName
Creating an Index Create Index <name> On <tablename>(<colNames>); Student DB System knows how to: 1- create the index 2- when and how to use it Create Index sNumberIndexOn Student(sNumber); Create Index sNameIndexOn Student(SName);
Multiple Predicates 1- The best the DBMS can do is using addressIndex ‘320FL’ 2- From those tuples, check sName = ‘Dave’ Student SELECT* FROM Student WHERE address = ‘320FL’ AND sName = ‘Dave’; Create Index addessIndexOn Student(address); cs3431
Multi-Column Indexes • Columns X, Y are frequently queried together (with AND) • Each column has many duplicates • Then, consider creating a multi-column index on X, Y SELECT* FROM Student WHERE address = ‘320FL’ AND sName = ‘Dave’; Directly returns this record only Create Index nameAddOn Student(sName, address);
Using an Index • DBMS automatically figures out which index to use based on the query SELECT* FROMStudent WHEREsNumber = 76544357; Student Automatically uses SNumberIndex Create Index sNumberIndexOn Student(sNumber); Create Index sNameIndexOn Student(SName); cs3431
How Do Indexes Work? cs3431
Types of Indexes • Primary vs. Secondary • Single-Level vs. Multi-Level (Tree Structure) • Clustered vs. Non-Clustered cs3431
Primary vs. Secondary Indexes • Index on the primary key of a relation is called primary index (only one) • Index on any other column is called secondary index (can be many) • In primary index, all values are unique • In secondary indexes, values may have duplicates Student Index on SSN is a Primary Index Index on sNumberis a Secondary Index Index on sNameis a Secondary Index
Single-Level Indexes • Index is one-level sorted list • Given a value v to query • Perform a binary search in the index to find it (Fast) • Follow the link to reach the actual record Student Index on sNumber
Multi-Level Index • Build index on top of the index (can go multiple levels) • When searching for value v: • Find the largest entry ≤ v, and follow its pointer Student 2nd level 1st level cs3431 Index on sNumber
Clustered vs. Non-Clustered Assume there is index X on column C • If the records in the table are stored sorted based on C • X Clustered index • Otherwise, X Non-Clustered index • Primary index is a clustered index Student • Non-Clustered • index • Clustered index
Index Maintenance • Indexes are used in queries • But, need to be maintained when data change • Insert, update, delete • DBMS automatically handles the index maintenance • When insert new records the indexed field is added to the index • When delete records their values are deleted from the index • When update an indexed value delete the old value from index & insert the new value • There is a cost for maintaining an index, however its benefit is usually more (if used a lot) cs3431
Summary of Indexes • Indexes are auxiliary structures for efficient searching and querying • Query answer is the same with or without index • What to index depends on which columns are frequently queried (in Where clause) • Main operations Create Index <name> On <tablename>(<colNames>); Drop Index <name>; cs3431
Transactions cs3431
What is a Transaction • A set of operations on a database that are treated as one unit • Execute Allor None • Transactions have semantics at the application level • Want to reserve two seats in a flight • Transfer money from account A to account B • … • What if two users are reserving the same flight seat at the same time??? Transactions solve these problems
Transactions • By default, each SQL statement is a transaction • Can change the default behavior SQL > Start transaction; SQL > Insert …. SQL > Update … SQL > Delete .. SQL > Select … SQL> Commit | Rollback; All of these statements are now one unit (either all succeed all fail) End transaction successfully Cancel the transaction
Transaction Properties • Four main properties • Atomicity– A transaction if one atomic unit • Consistency– A transaction ensures DB is consistent • Isolation– A transaction is considered as if no other transaction was executing simultaneously • Durability– Changes made by a transaction must persist • ACID: Atomicity, Consistency, Isolation, Durability • ACID properties are enforced by the DBMS cs3431
Consistency Issue • Many users may update the data at the same time • How to ensure the result is consistent 2 1 Update T Set x = x * 3; Update T Set x = x + 2; 3 What is the right answer??? Wrong, Inconsistent data
Serial Order of Transactions • Given N concurrent transactions T1, T2, …TN • Serial order is any permutation of these transactions (N!) • T1, T2, T3, …TN • T2, T3, T1, …, TN • … • DBMS will ensure that the end-result from executing the N transactions (concurrently) matches one of the serial order execution • That is called Serializability • As if transactions are executed in serial order cs3431
Serializable Execution • Given N concurrent transactions T1, T2, …TN • DBMS will execute them concurrently (at the same time) • But, the final effect matches one of the serial order executions Update T Set x = x * 3; Update T Set x = x + 2;
Isolation Levels • Read Uncommitted • Read Committed • Repeatable Read • Serializable Gets stronger & avoids problems That is the default in DBMS cs3431
1- READ UNCOMMITTED Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- blue select color from cust where id=500; color ----- blue -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- update cust set color='blue' where id=500; -----------COMMIT------------ | | | | V Time NonRepeatable read (bad) Dirty read (bad)
2- READ COMMITTED Dirty Read Solved Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- red select color from cust where id=500; color ----- blue -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- update cust set color='blue' where id=500; -----------COMMIT------------ | | | | V Time NonRepeatable read (bad)
2- READ COMMITTED Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- red select color from cust where id=500; color ----- -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- delete cust where id=500; -----------COMMIT------------ | | | | V Time Phantom (bad)
3- REPEATABLE READ NonRepeatable Read Solved Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- red select color from cust where id=500; color ----- red -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- update cust set color='blue' where id=500; -----------COMMIT------------ | | | | V Time
3- REPEATABLE READ Phantom (For Delete) Solved Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- red select color from cust where id=500; color ----- red -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- delete cust where id=500; -----------COMMIT------------ | | | | V Time
3- REPEATABLE READ Session 2 -------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id -- select id from cust where color=‘blue’; id -- select id from cust where color=‘blue’; id -- 500 -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’); -----------COMMIT------------ | | | | V Time Phantom Insert (bad)
4- SERIALIZABLE Phantom Solved Session 2 -------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id -- select id from cust where color=‘blue’; id -- select id from cust where color=‘blue’; id -- -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’); -----------COMMIT------------ | | | | V Time
Summary of Transactions • Unit of work in DBMS • Either executed All or None • Ensures consistency among many concurrent transactions • Ensures persistent data once committed (using recovery techniques) • Main ACID properties • Atomicity, Consistency, Isolation, Durability cs3431
END !!! cs3431
Final Exam • Dec. 13, at 8:15am – 9:30am (75 mins) • Closed book, open sheet • Answer in the same exam sheet • Material Included • ERD • SQL (Select, Insert, Update, Delete) • Views, Triggers, Assertions • Cursors, Stored Procedures/Functions • Material Excluded • Relational Model & Algebra • Normalization Theory • ODBC/JDBC • Indexes and Transactions Friday’s Lecture (Revision + short Quiz)