520 likes | 538 Vues
Online Friends. Presented by Dipannita Dey and Andy Vuong Scribed by Ratish Garg. Social Hash: an Assignment Framework for Optimizing Distributed Systems on Social Networks. Presented By Dipannita Dey(ddey2). 2. K4. K2. K3. K6. N3. K1. N2. N1. Background: Consistent Hashing.
E N D
Online Friends Presented by Dipannita Dey and Andy Vuong Scribed by Ratish Garg
Social Hash: an Assignment Framework for Optimizing Distributed Systems on Social Networks Presented By Dipannita Dey(ddey2) 2
K4 K2 K3 K6 N3 K1 N2 N1 Background: Consistent Hashing 0 Read/write K1 Coordinator N – nodes/servers storing data K – objects/requests Client 3
Assignment Problem 4 Taken from original slide
Assignment Problem Optimization Putting possible data-records accessed by a single query in a single storage component Grouping similar user requests 5 Adapted from original slide
Requirements Challenges • Map similar objects to one cluster • Assignment Stability • Adaptive • Minimal Response time • Load Balancing • Effects of similarity on load balance • Addition and removal of objects • Scale • Dynamic workload • Heterogeneous components in infrastructure Change at modest rate Enormous Predictable 6
Social Hash Framework (Conceptual Model) (C) N > 1 N:= |G|/|C| Conceptual entities (G) N = 1 7 Adapted from paper
Key Contributions • Forming Groups of relatively cohesive objects in Social Graph • Separation of Optimization on social network from Adaptation to Changes in Workload and Infrastructural components • Use of ‘Graph Partitioning’ for static assignment • Use of ‘Query History’ to construct bi-partite graph (upon which graph partitioning is applied) • Reduced ‘Cache miss rate’ by 25% • Cut the average response Latency in half 8
Social Hash Framework (Actual Model) 9 Taken from paper
Dynamic Assignment • Adapt to maintain workload balance by changing group to component mapping • Group to component ratio(N) controls trade-off between static and dynamic assignment • N >> 1 -> Dynamic assignment • Factors affecting load-balancing strategies • Accuracy in predicting future load • Dimensionality of load • Group transfer overhead • Assignment Memory 11
HTTP Request Routing Optimization • Static assignment • Uni-partite graph representing friendship • Maximize Edge locality • Dynamic assignment • Existing Consistent Hashing • Tradeoff b/w Edge-locality and no. of groups • Production results • 21k groups • Maintained edge locality over 50% • Updated every week (1% degradation) 12 Plot from paper
Experimental Observation 13 Plots from paper
Storage Sharding Optimization • Static assignment • Bi-partite graph representing recent queries • Minimize Fanout • Group to component ratio = 8 • Dynamic assignment • Based on Historical Load Patterns • Production results • 2% increase in fanout • Static assignments every few months • Average Latency decreased by 50% • CPU utilization decreased by 50% 14 Plot from paper
Experimental Observation 15 Plots from paper
Doubts/ Questions 1. Fanout vs Parallelism? - Utilize machine parallelism with low fanout 2. Why custom graph is used? - Graph [queries -> Data Records] which bests represents the distributed problem 3. How frequently does dynamic assignments change and how that impact performance? - Application- specific (depends on group to component ratio) 4. How does dynamic assignments affect TAO cache miss rate? - Cache warms up faster because of similar requests 16
Thoughts 1. Application-specific 2. Assumption – workload and infrastructure change at modest rate 3. Use historical data patterns in dynamic assignment algorithm 4. No details into the customization above Apache Giraph 5. Not sure about dynamic assignments for replicas 6. Some experimental results with network bandwidth vs fanout will be nice 17
Take away • Patterns in assignment problem • Proposed Social Hash-framework for solving assignment problem on Social Network • 2-level schema to decouple optimization from workload and infrastructural changes • Lower fanout may provide better performance 18
TAO: Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani Facebook, Inc. (2013 USENIX) Presented by Andy Vuong 19
TAO: The Associations and Objects • Geographically distributed graph system used in production at Facebook • Paper published in 2013 • Three Contributions • Efficient and available read-mostly access to a changing graph • Objects and Associations Model • TAO & Evaluation 20
The Social Graph 21 Image: http://www.freshminds.net/wp-content/uploads/2012/02/Picture1.png
The Social Graph: Old Stack + + 22
Problems with Memcached? • Key, value store • Distributed control logic • Expensive read-after-write consistency 23
The Social Graph: New Stack TAO + 24
TAO arrives • Read-efficient distributed graph caching system to help serve the social graph • Built on top of a “associations and objects” model 25
Facebook focuses on people, actions, and relationships • Objects are typed nodes: • (id) -> (otype, (key -> value)*) • Associations are typed directed edges: • (id1, atype, id2) -> (time, (key -> value)*) • Associations may be coupled with an inverse edge. Edges may or may not be symmetric 27
Actions are either objects or associations • Repeatable actions are best suited as objects • Associations model actions that happen at most once or actions that record state transitions 29
TAO Data API • Provides simple object API for creating, retrieving, updating, and deleting an object • Provides simple association API for adding, deleting, and editing an association • Association Query API Provided: • assoc_get • assoc_count • assoc_range • assoc_time_range 30
Architecture: Storage Layer • MySQL as the persistent store • Divide data into logical shards • Database server responsible for one or more shards • All the objects in the database have a shard_id that identifies the hosting shard • Association stored on shard of its source 32
Architecture: Caching Layer • Multiple cache servers from a tier • TAO further divides the cache layer into two levels: • Leaders • Followers • Leaders are cache coordinators responsible for direct communication with persistent storage (read misses, writes) • Client communicates with closest follower tier 33
Architecture: Cache Per-Region Tier Setup Followers Followers Leaders MySQL Clients 34 Cache Layer Storage Layer
Write sent to master leader Consistency Messages delivered in B Master region sends read misses, writes to master DB Read misses to replica DB Architecture: Master / Slave Replication 36
Consistency • TAO provides eventual consistency • Asynchronously sends cache maintenance messages from leader to the followers • Changeset / Inverse Edges 37
Fault Tolerance • Per destination timeout • Database Failures • Leader Failures • Followers Failures 38
Optimizations • Shard Loading • High Degree Objects 39
Evaluation 40
Distribution of the return values from assoc_count • Distribution of the number of associations from range queries 1% > 500k 42
Distribution of the data sizes for TAO query results • Throughput of an individual follower 39.5% associations queried by clients contained no data peak query rate rises with hit rate 43
Client-observed TAO latency for read requests • Writes latency across two data centers 58.1 msec on away (avg RTT) 44
Related Work • Shares features with • Trinity, a in-memory graph datastore • Neo4j, an open-source graph database w/ ACID semantics • Twitter’s FlockDB for its social graph • Akamai groups clusters into regional groups similarly to FB’s follower and leader tiers • Scaling Memcache at Facebook Paper 45
Questions? 46
Discussion - Tao Backing storages other than MySQL can be more efficient for read-heavy workload? Why do they use MySQL as their underlying DB instead of a graph DB like Neo4j? Is this design applicable to other data serving services? Are there other large graph-based or non-graph-based datasets that this could be extended for? Data Naturally in graph form so why not graph db?
Discussion - Tao Why have a leader cache? It would be interesting to see how these systems behaves for the 'viral' like social media scenarios. By definition, these are global in nature and spread quickly. Do such interactions perform poorly? Followup: Leader cache size? Is that a huge overhead since its providing data to follower cache?
Discussion - Tao How to restore the leader’s consistency after failure?
Discussion - Social Hash What kind of applications are suited to smaller or larger values of n?(group to component ratio) How to trade off between static and dynamic schedule or how to find an optimal group/component number? I failed to find a clear description of this.