1 / 31

Efficient Concurrency Control in Multidimensional Access Methods

Efficient Concurrency Control in Multidimensional Access Methods. Kaushik Chakrabarti Sharad Mehrotra University of Illinois at Urbana Champaign University of California at Irvine Presented at ACM SIGMOD Conference June 1, 1999. Outline of talk. Introduction Background

opa
Télécharger la présentation

Efficient Concurrency Control in Multidimensional Access Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Concurrency Control in Multidimensional Access Methods Kaushik Chakrabarti Sharad Mehrotra University of Illinois at Urbana Champaign University of California at Irvine Presented at ACM SIGMOD Conference June 1, 1999 http://www-db.ics.uci.edu

  2. Outline of talk • Introduction • Background • Phantom protection in Generalized Search Trees • Define granules • Describe lock protocols • Experiments • Conclusion http://www-db.ics.uci.edu

  3. Introduction • Increasing number of applications deal with multidimensionaldata • Examples: spatial (CAD, GIS), spatio-temporal (moving objects, weather) • DBMS should allow applications to: (1) define their own data types and operations (2) define multidimensional access methods (AMs) for those data types for efficient query processing • OR technology solves (1) • Generalized Search Trees (GiSTs) addresses (2) http://www-db.ics.uci.edu

  4. Introduction • For successful integration, we need to support concurrent accesses via GiST • Concurrency control problems: (1) Preserve consistencyof data structure (2) Prevent phantom anomalies • (1) has been addressed in Kornacker, Mohan and Hellerstein, SIGMOD97 • This paper addresses the problem of phantom protectionin GiSTs http://www-db.ics.uci.edu

  5. Phantom • Definition: • T1 reads a set of items satisfying some <search-condition> • T2 creates data items that satisfy T1’s <search-condition> and commits • T1 repeats its scan with the same <search-condition>, gets a different set of items • Serializability Þ No phantoms http://www-db.ics.uci.edu

  6. Example http://www-db.ics.uci.edu

  7. Solution • Predicate locks: costly • Granular locks: efficient http://www-db.ics.uci.edu

  8. Key Range Locking • ARIES/KVL(Mohan, 1990) http://www-db.ics.uci.edu

  9. Phantoms in Spatial/Spatio-temporal Databases • Compute average rainfall over all locations a 2-d region where the locations are indexed using a GiST • Get all objects in a given region from a moving objects database where the objects are indexed using a GiST http://www-db.ics.uci.edu

  10. Solutions • Adapting KRL: too costly. • Predicate locking based strategy by Kornacker, Mohan and Hellerstein, SIGMOD97. • Our granular locking based approach for phantom protection in R-trees, ICDE98. Does not work well when applied to GiSTs (details in paper) http://www-db.ics.uci.edu

  11. Granular Locking in GiST • Solution involves • Define the granules • Define the lock protocol for the operations • Challenges • “nice’’ granules • handling overlap among granules • handling “loss of lock’’ problem • high concurrency and low lock overhead http://www-db.ics.uci.edu

  12. GiST • Keys can be arbitrary predicates • An AM can be implemented by specifying some extension methods which dictate the tree operations http://www-db.ics.uci.edu

  13. Granules in GiST • Leaf Granules: One per leaf node • Non-leaf granules: One per non-leaf node • Lock name: <table-name, index-name, node-id> • Lock Coverage: defined by Granule Predicate (GP) • GP(N) = BP(N) if N is root = BP(N) Ù GP(P) otherwise, P=parent(N) http://www-db.ics.uci.edu

  14. http://www-db.ics.uci.edu

  15. Locks http://www-db.ics.uci.edu

  16. Overlap between granules • Correctness: p Ù p’ Þ lset(p) Ç lset(p’) ¹ NULL • Problem does not arise in KRL • Policies • Overlap-for-Search & Cover-for-Insert (OSCI) • Cover-for-Search & Overlap-for-Insert (CSOI) http://www-db.ics.uci.edu

  17. Loss of lock coverage http://www-db.ics.uci.edu

  18. Search Protocol • Get commit duration S lock on the granule corresponding to each index node visited • Correctness: • GP(T) Ù Q is satisfiable Þ Ùi(Consistent(BP(Pi), Q), Pi is ancestor of T • Note • No object locks • No extra cost except that of acquiring the lock (no extra checks) http://www-db.ics.uci.edu

  19. Insert Protocol • Correctness: • full coverage • prevent phantoms due to loss of lock coverage http://www-db.ics.uci.edu

  20. Insert Protocol • Case 1: No growth, No split • commit duration IX lock on g (target granule) • commit duration X lock on O • Case 2: Growth, No split • 2 locks as before • short duration IX lock on lowest unchanged node (LU-node) http://www-db.ics.uci.edu

  21. Example http://www-db.ics.uci.edu

  22. Insert Protocol • Case 3: No growth, Split • instant duration SIX on g • commit duration IX on whichever contains O after split; X on O • instant duration SIX on each ancestor that splits • Case 4: Growth, Split • lock requirements of Cases 2 and 3 http://www-db.ics.uci.edu

  23. Deletion Protocol • Problem: g does not cover O after deletion Þ commit duration lock on LU-node • We do: • logical deletion (IX on target granule, X on object) • defer physical deletion till transaction commits http://www-db.ics.uci.edu

  24. Protocol for Other Operations • ReadSingle: S lock on object • UpdateSingle: • if indexed attributes not changed, IX on g, X on O • else, deletion followed by insertion • UpdateScan: same as search for the region, same as updatesingle for every object updated http://www-db.ics.uci.edu

  25. Empirical Evaluation • Data sets: • 2-d spatial data: 62,556 2-d points from Sequoia 2000 benchmark • 3-d feature data: First 3 Fourier coefficients from 480,471 Fourier vectors http://www-db.ics.uci.edu

  26. Measurements & Parameters • Performance: Throughput (tps) • Concurrency: Conflict ratio • Overhead: #locks, # pred. Checks • Parameters: MPL, transaction size, write probability, query size, external think time (fixed 3sec), restart delay (fixed 3sec) http://www-db.ics.uci.edu

  27. Implementation http://www-db.ics.uci.edu

  28. 2-d data 3-d data Performance http://www-db.ics.uci.edu

  29. Under various loads Conflict ratio Performance/Concurrency http://www-db.ics.uci.edu

  30. Search Insert Overhead http://www-db.ics.uci.edu

  31. Conclusions • GL is significantly more efficient than PL • We expect the performance gap to increase with better implementation (mainly LM) • Dimensionality curse is a problem in GL • Can be integrated with a consistency protocol for complete solution to concurrency control in multidimensional AMs http://www-db.ics.uci.edu

More Related