Concurrency Control in Distributed MRA Index Structures

Concurrency Control inDistributed MRA Index Structures 18th December 2008 Neha Singh, S. Sudarshan (IIT Bombay)

Introduction Problem statement: Computing aggregate queries over a region in a multi-dimensional space containing mobile point data, when the data is stored in a distributed system. Need for such aggregate queries: Networked Virtual Environment: Widely used in online games and training simulator E.g.: "Run in fear if number of marching enemy troop exceed number of friends around you" (aggregate count query) Real-time traffic monitoring Issues Large non-static data storage requires distributed system & dynamic spatial partitioning Synchronized system clocks not feasible for large scale distributed systems Combining local state information from different peers can give inconsistent aggregates due to varied communication delay between them

Key contributions A distributed multi-resolution aggregate index structure to support dynamic object set: The multi-resolution aggregate tree stores precomputed aggregates at each tree node in a centralized system, to speed up aggregate queries We extend it to support non-static data in a distributed system Atomic updates/reads: Our read protocol and aggregate tree update protocols ensure that updates are atomic to reads (aggregate queries), Highly concurrent update protocol: We present a highly concurrent multi-phase update protocol that avoids blocking of reads minimizes contention with concurrent updates

Agenda • Introduction • Problem statement & Motivation • System Model • Readers protocol • Maintenance of the index structure • Definitions • Naive update protocol • Multi-phase update protocol • Experimental Analysis

System Model – Partitioning the global space 0 2128 - 1 Quadtree Regions For partitioning, we use a quad tree based regular decomposition of space • Benefits • Partitions independent of order of data insertion • Decomposition implicitly known by all end systems • Mapping the regions onto the P2P overlay • Each quad tree block has a unique centroid • Use this as key in DHT to map the regions onto peer set

1 6 2 2 4 3 5 4 2 1 System Model – MRA Tree An MRA-Tree (Multi-Resolution Aggregate Tree) is a modified multi-dimensional index structure that stores pre-computed aggregates at various resolutions in its intermediate nodes 10 count sum minarray maxarray • A leaf node contains the actual data points - <loc, value> • A non-leaf node stores aggregate for all data points indexed by it - {COUNT,SUM, MINARRAY,MAXARRAY} where MINARRAY,MAXARRAY are the min and max resp. of each child node Non-Leaf Node 30 4 1 6 2 4 5 2 6 Leaf Nodes

Readers Protocol Q Q B A A B C D F E Q G H E F G H N N C D 1 Readers Protocol Relation of the query to node Read query over region Q Nodes read Q Is contained orpartially overlaps 1 Query traverses the index structure top down, starting from root node and selectively exploring the nodes 2 Q N Q Encloses 3 Encloses so further traversal not required 4 Q Disjoint Intersecting with Q and read Further traversal not necessary Intersecting with Q but not read

Readers Protocol Naïve Read Method: Get lock on all nodes while traversing down the tree Release lock after read is completed However… This reduces the concurrency of the index structure for concurrent updates Needed Release of locks early At the same time prevent updates coming from top-down to overtake Solution: Use Crabbing Protocol – Acquire lock on all the child nodes before releasing the lock on the parent 1 Readers Protocol

Maintenance of Index Structure 2 Maintenance of Index Structure Aim: To update the distributed aggregate tree such that these operations are atomic while causes minimum blocking to read • We consider two types of updates • Move operation • Within the same node • Across different nodes • Insertion / deletion operation • Index tree needs to be updated only in case of transfer across different nodes and insertion / deletion operation • Since our application is data-driven, updates percolate from leaf nodes to the higher levels of the hierarchy

Definition: Update Tree Update Tree: Set of all the nodes (UT) of the distributed MRA tree whose stored aggregate values are affected by the transaction T N B Move A 2 Maintenance of Index Structure Insertion / Deletion operation Move operation Consists of all ancestor nodes of the leaf node Consists of ancestors up to lowest common ancestor of the leaf nodes N Insertion / Deletion A

Definition: Update Tree N B Move A 2 Maintenance of Index Structure • Importance of Update Tree: • Although updates propagate up from leaf nodes, only nodes in update tree are affected • Hence locks can be acquired top-down from root of update tree • going to the tree root each time would overload site containing root • We use order of lock acquire at root node of update tree to serialize concurrent intersecting read and update queries

Definition: Conflicting Updates 2 Maintenance of Index Structure N2 N2 N1 N1 U1 U2 U1 Importance: • Common part of two update trees is connected and has a unique highest node • Order of access to this node = serialization order of concurrent conflicting updates U2 Conflicting updates: Two updates U1 and U2 are said to be conflicting updates if U1T ∩ U2T ≠ Φ, where U1T and U2T are the corresponding update trees

Naïve Update Protocol N N B B A 2 Maintenance of Index Structure • X-Lock on all update tree nodes and then update them • Order of acquiring locks – top-down, as bottom-up can lead to deadlock with read query Step II: Update and Release Lock Step I: Acquire Lock Phase • X-locks is acquired on all update tree nodes top down starting from root node • Updates propagate bottom-up • Nodes release locks after updating agg • Root node releases lock only after update over in both legs A

Naïve Update Protocol 2 Maintenance of Index Structure Key modifications Problem: Low concurrency • Lock retained on root node of update tree for the entire duration • It being X-locked results in low concurrency and higher read time • Issues • Read query comes top-down, and updates go bottom-up • Root node last to be updated and first to be read • Still need to ensure update is atomic for read We propose a highly concurrent multi-phase update protocol Key modifications • Allow concurrent read while acquiring locks and updating other nodes • Introduce a new locking mode: U-lock compatible with S-lock • Nodes updated top-down • Split the update process in 3 phases • Prevent read to overtake top-down update and read inconsistent value • Use crabbing protocol while acquiring locks to update nodes 1 2 3

Multi Phase Update Protocol 2 Maintenance of Index Structure • Update Lock Mode – A new locking mode compatible with read • Locked nodes for possible future modification • Can be upgraded to X-lock when needed 1 Compatibility Matrix • U-S: True => Read Query can proceed while update is modifying other nodes of the update tree • U-U: False => Conflicting updates need to wait for each other

N N B N δU δU δU B B δU A δU A δU A 2 Maintenance of Index Structure Multi-phase update Protocol Update split into 3 phases 2 Refresh Phase Acquire Lock Phase Propagate Phase • U-locks upgraded to X-locks • Stored pendingUpdates get executed top-down • X-locks acquired on child nodes then lock released – Crabbing Protocol U-locks acquired top-down starting from root node Update gets propagated bottom-up from leaf nodes and are stored as pendingUpdates 3

Multi Phase Update Protocol- Correctness and efficiency 2 Maintenance of Index Structure Serialization Order • R - U Order: Order of read query S-lock and update X-lock at update tree root node • U - U Order: Order of U-lock point at the highest node of the common twig pattern Importance of separation of Acquire Lock and Propagation phases • U-locks acquired bottom-up • Acquiring U-lockstop-down cannot lead to deadlock with read (as in the naïve case) • But, merging both phases can lead to deadlock between concurrent conflicting updates Importance of Crabbing protocol • Used for upgrading U-lock to X-lock top-down • Thus read query cannot overtake an update • It sees the state either before or after refresh on all intersecting nodes => Update atomic for read

Multi Phase Update ProtocolScenario I: Can the propagated leaf node value change before update gets over? m N 2 Maintenance of Index Structure • Consider a new max value (m) was propagated up during the propagate phase • What if this max gets changed between the time it is propagated up and it gets executed at the nodes? Getting a U-lock for this entire duration between propagate and refresh phases, ensures that no other update can change the node’s value being propagated up the tree

2 Maintenance of Index Structure Multi Phase Update ProtocolScenario II: Can the stored pendingUpdate value get stale? 10 Assume U decreases max aggregate value at node B to 3 10 6 C D 10 2 6 8 A 4 6 3 B 4 1 3 4

2 Maintenance of Index Structure Multi Phase Update ProtocolScenario II: Can the stored pendingUpdate value get stale? 10 Assume U decreases max aggregate value at node B to 3 10 6 4 C D 10 2 6 4 8 A 4 6 3 B 4 1 3 4

2 Maintenance of Index Structure Multi Phase Update ProtocolScenario II: Can the stored pendingUpdate value get stale? 10 Assume U decreases max aggregate value at node B to 3 • What if max value of node A • changes meanwhile? • This cannot happen because: • Any transaction attempting to modify the max value of A would intersect with U on at least node C. • Thus would be executed serially 10 6 4 C D 10 2 6 4 8 A 4 6 3 B 4 1 3 4

2 Maintenance of Index Structure Multi Phase Update ProtocolScenario II: Can the stored pendingUpdate value get stale? 10 Assume U decreases max aggregate value at node B to 3 • What if max value of node A • changes meanwhile? • This cannot happen because: • Any transaction attempting to modify the max value of A would intersect with U on at least node C. • Thus would be executed serially • Note: Caching min/max values on parent node helps reduce update latency by greatly reducing number of nodes required to be locked 10 6 4 C D 10 2 6 4 8 A 4 6 3 B 4 1 3 4

Multi Phase Update ProtocolScenario III: Multiple updates to an entity A D U1 B C U2 2 Maintenance of Index Structure • Consider an entity to be transferred by an update U1 from A to B and then by U2 from B to C • Logically, U1 should get reflected on nodes B and D before U2 Causal order of execution at node B makes sure that U1 completes before U2 begin

Divisible Aggregates 2 Maintenance of Index Structure Consider an update transaction which causes only change in the sum and count of the leaf nodes and no change in min/max Observation: Change in the agg for all nodes in the update tree is known No need to propagate these changes bottom-up Overview of update protocol: Such transactions can have only one update phase X-lock acquired top-down using crabbing protocol Updates are executed and locks released

Comparative Analysis of the Update Methods 2 Maintenance of Index Structure Aim: Estimate difference in concurrency provided by the update protocols Acquire Lock Phase dm Update Phase dm Naïve Update Protocol N locked for Read Query Acquire Lock Phase dm Propagate Phase dm Refresh Phase 2d (m-2)d Multi Phase Update Protocol Update Phase 2d (m-2)d Updating only Divisible Aggregates d – communication delay per link m - # edges in the longer leg N – root node of update tree

Experimental Setup 3 Experimental Analysis • Synthetic data set with non-uniform distribution of data points • DHT used – FreePastry implementation of Pastry DHT • Parameters for quadtree: • fmin = 14 • Every peer node specifies its threshold (t) as count of the number of entities it can support • Threshold can be as low as 0 • We study thedistributed MRA tree’s update protocol • Running time of read queries with and without updates • Running time of updates

For getting MRA index structure of varying depths, use peer threshold as the lever 3 Experimental Analysis Variation of the min/max depth of the partitioning tree Depth of the quad tree increases as the threshold approaches µ (#entities/# peers) Depth Threshold / µ

Read query time taken depends number of nodes read, and not the query region size 3 Experimental Analysis Variations of the read query duration as # nodes read increase, with no updates Variations of the read query duration as query region increases, with no updates Time (ms) Time (ms) 400 350 300 250 Query region / Total area (in %) # Nodes covered

Update time taken directly proportional to # update tree nodes , 3 Experimental Analysis Variation of average update time with increase in number of update tree nodes Time (ms) Naive protocol MultiPhase locking protocol Number of nodes

Average read time taken increases much less for multi-phase update protocol as compared to naive protocol 3 Experimental Analysis Read Query Duration for different Naïve Update Workloads Read Query Duration for different MP Update Workloads F2 > F1 (Frequency of updates) F2 > F1 (Frequency of updates) Time (ms) Time (ms) Number of nodes Number of nodes

Conclusion • We propose a Distributed Multi-Resolution Aggregate Tree index structure for answering aggregate queries over mobile entities • We point out problems with concurrent updates and propose the multi-phase update protocol • ensures that updates and aggregate queries are atomic wrt each other • minimizes contention and avoid deadlock • Analysis and experimental results show • The multi-phase update protocol requires a longer update time • But offers high concurrency for the read queries as compared to naïve update protocol

Thank you! Questions?

Concurrency Control in Distributed MRA Index Structures

Concurrency Control in Distributed MRA Index Structures

Presentation Transcript

Concurrency Control II and Distributed Transactions

Concurrency Control

Optimistic Concurrency Control for Distributed Learning

Index Structures

Concurrency Control

Concurrency Control

Concurrency Control

Control flow structures in Distributed programs

Index structures

Distributed Concurrency Control

Concurrency Control

Concurrency Control

Concurrency Control

Index Structures

Concurrency Control

7. Distributed Concurrency Control

Concurrency Control

Lecture-12 Concurrency Control in Distributed Databases

Concurrency Control in Distributed Databases

Concurrency Control