Distributed Systems - Plan 3 Report 2
Distributed Systems - Plan 3 Report 2. Siddharth Sarasvati Karthikeyan Karur Balu. Introduction. Traditional distributed system issues Load Balancing Data Integrity Performance Common approaches for load balancing Virtual Servers ID Reassignment Multiple random choice scheme
Distributed Systems - Plan 3 Report 2
E N D
Presentation Transcript
Distributed Systems - Plan 3 Report 2 SiddharthSarasvati KarthikeyanKarurBalu
Introduction • Traditional distributed system issues • Load Balancing • Data Integrity • Performance • Common approaches for load balancing • Virtual Servers • ID Reassignment • Multiple random choice scheme • Local Probing
Research paper I • Author: Gurmeet Singh Manku • Title: Balanced binary trees for ID management and load balance in distributed hash tables • Conference: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing • Year: 2004 • URL: http://dl.acm.org/citation.cfm?id=1011797
The ID Assignment Problem • How does a new host acquire an ID? • No global knowledge of “current set of ID’s” • Low cost (# messages) • Almost equi-sized partitions • This paper presents a low-cost, decentralized algorithm for ID management in DHT
Naïve ID Assignment Choose ‘r’ a random number in [0,1) σ = θ(n log n) with n hosts in the system σ >100 when n = 4K Can we do better? perhaps.. If we could “learn” a few ID’s σ= Partition-balance ratio = Ratio of the largest to the smallest partition
The Algorithm • Upon arrival, a host identifies the manager of a random number in [0, 1) • Identifies the IDs of ‘c log n’ hosts adjacent to the manager along the circle • Splits the largest manager into two.
0 1 .01101 .01100 .0001 .0000 Only leaf nodes correspond to IDs in [0, 1) Balance Binary Trees • Small fraction of internal nodes are marked active • For every leaf node, exactly one internal node along the path from that leaf node to the root is active • Insertion done in 3 steps
Claim RANDOM walk down the tree Walk up until sub-tree has (c log n) leaves Split the “shallowest leaf” below the sub-tree Claim: A newly-arrived host needs (R + log n) messages Leaves in at most 3 different levels So σ 4
Features of Algorithm • Generality: Independent of overlay network topology • Low cost: Θ(R + log n) • Optimal re-assignments • Handles host “departures” with only 1 re-assignment • “arrivals” require no re-assignments • Small partition balance Ratio(σ 4) optimal
Research Paper 2 • Author: Brighten Godfrey, KarthikLakshminarayanan, SoneshSurana, Richard Karp and Ion Stoica • Title: Load Balancing in Dynamic Structured P2P Systems • Conference: Proceedings of IEEE Infocom, Hong Kong, March 2004 • Year: 2004 • URL: http://www.cs.berkeley.edu/~karthik/research/papers/infocom04.pdf
Goal • Goal : To maintain the system in a state in which load on a node is less than its target • Load : Depends on the particular P2P system. Eg Storage, Bandwidth • Target : Maximum load a node can hold.
Node A Node B Node C Chord Ring Random ID space distribution • Contiguous region of the ID space. • Each node can be responsible for many virtual servers. • Consider Chord Ring.
Random ID space distribution • Contiguous region of the ID space. • Each node can be responsible for many virtual servers. • Consider Chord Ring. Node A Node B Node C Chord Ring
11 20 L=45 15 L=41 3 L=31 L=3 10 20 30 Random Mapping of nodes • May result in Imbalance either from mapping or addition of new data to the system Node A T=50 Node B T=35 Heavy Node C T=15 Chord Ring
L=45 L=31 L=3 30 L=41 ID Space redistribution Choose where L>T and check with other nodes to redistribute the load 11 20 Node A T=50 15 3 Node B T=35 Heavy Node C 10 20 T=15 Chord Ring
L=45 L=31 L=14 L=30 ID Space redistribution Result in maintaining the GOAL, always L <= T 11 20 Node A T=50 15 3 Node B T=35 Node C 10 30 T=15 Chord Ring
H H L L L L H L L Load Balancing Scheme 1: One-to-One Light contacts the node x responsible for it, and accepts load if x is heavy. It takes ~ O(N)^2 operations.
L1 D1 H1 L2 H3 L3 L5 H2 D2 L4 Light nodes Directories Heavy nodes Load Balancing Scheme 2: One-to-Many • Light nodes report their load information to directories. • Directories are present in DHT • Heavy node H gets this information by contacting a directory. • H contacts the light node which can accept the excess load.
Research Paper 3 • Author: Minseok Kwon, Gahyun Park • Title: Distributed Tries for Load Balancing in Peer-to-Peer Systems • Conference: Proceedings of IEEE IWQoS, June 2010 • Year: 2010 • URL: http://www.cs.rit.edu/~jmk/papers/trieload.pdf
Algorithm Goal : If Trie is balanced, ID space will be balanced
Basic Idea (New node Join) • Optimal Path Discovery – A new node travels down the trie from the root taking the path towards the minimum depth • Drawback : Global knowledge of ID space New node
Node join/leave process • ‘y’ joins with a Random ID ‘r’ and locate the host that owns the interval • Starting from ‘r’ it travels up until |id(r)| = number of bits of id(r)
Hypothesis • Distributed Trie for load-balancing in a structured P2P system allows a node to join or leave the system at low cost, R+Θ(log logn), where R denotes the routing cost and n denotes the number of nodes.
Algorithm (Node Join Process) • |id(r)| = number of bits of id(r) • While i < log|id(r)| + 4
Deliverables • Simulate the P2P distributed system with DHT and implement Balanced Binary Tree and the Distributed Trie load balancing algorithms • Graphical representation comparing node arrival and departure cost(routing cost, ID reassignment)
Progress • Comprehensive understanding of the research papers • Discussions on a generic simulation design to fit in different load balancing algorithms as a pluggable module • Analysis of the discussed Load Balancing algorithms