490 likes | 610 Vues
This presentation delves into how MySpace, the largest social network based in Los Angeles, scales its .NET platform using Microsoft Robotic Studio's Coordination and Concurrency Runtime (CCR). Handling over 6 million requests per second and vast amounts of user-generated data, MySpace employs innovative caching strategies and a robust data relay architecture. This approach ensures real-time updates for users through an activity stream, providing a seamless and responsive experience. Explore how CCR facilitates coordination among numerous processing "robots," managing high-velocity data with minimal latency.
E N D
Robots at MySpaceScaling a .NET Website with Microsoft Robotic Studio Erik Nelson Group Architect / enelson@myspace-inc.com Akash Patel Senior Architect / apatel@myspace-inc.com Tony Chow Development Manager / tchow@myspace-inc.com Core Platform MySpace.com
MySpace is the largest Social Network • … based in Los Angeles • The largest .NET website in the world • Can’t be big without Caching • >6 million requests/second to the middle tier at peak • Many TB of user generated cached data • Data must not be stale • Users hate that • New and interesting features require more than just a “cache” • Our middle tier is called Data Relay, because it does much more than just cache. • Data Relay has been in production since 2006!
CCR • What is CCR? • Coordination and Concurrency Runtime • Part of the Robotics Toolkit • Provides • Thread pools (Dispatcher) • Job queues (DispatcherQueue) • Flexible ways of connecting actions to those queues and pools (Ports and Arbiter)
Graphs are Cool Requests / Sec
The Stream • The stream is everywhere • The stream is extremely volatile data • Both the “who” and the “what” • Updates from our Users don’t just go to us • Twitter, Google, etc • ~60,000 Stream Queries per Second • Over 5 billion a day • 35 million updates a day • ~5 TB of Data in our Stream
Why not a DB? • We decided to be “publisher” based and not “subscriber” based • For us, that would involve a massively distributed query • Hundreds of databases • Decoupling writing from reading
OK So How Then? Robots!
Robots? • Lots of inputs and outputs! • Need for minimum latency and decoupling between jobs! • Just like a robot!
Abusing a Metaphor • Our robots must • Incorporate incoming messages • Tell their neighbors about any messages they receive • Be able to answer lots of questions • Talk to other robots when they need more info • Deal with other robots being slow or missing
How Does CCR Help? • Division of labor • Incorporate incoming messages • Tell their neighbors about any messages they receive • Be able to answer lots of questions • Talk to other robots when they need more info • Deal with other robots being slow or missing
How Does CCR Help? • Queue Control • We can has Buckets • Queue Division • Different destinations have their own queues • Strict Pool Control
Akash Patel Senior Architect
Activity Stream • Activity Stream (News Feed) • Aggregation of your friends activities • Activity Stream Generation • Explicitly: Status Update • Implicitly: Post New Photo Album • Auto: 3rd Party App
Friends & Activities Your Friends … You post a new status update .. an index is created You upload a new photo album .. Index Updated Index grows with new activities • Publisher Based Cache • - Activity Associated to Publishing User Where’s the Activity Stream? Imagine this is You …
Friends & Activities • Activity Stream Generated by Querying • Filter & Merge Friend’s Activities Very Volatile
Stream Architecture • Utilizes Data Relay Framework • Message Based System • Fire & Forget Msgs [Save, Delete, Update] • RoundTripMsgs [Get, Query, Execute] • Replication & Clustering Built-in • Index Cache • Not a Key/Value Store • Storage & Querying System • 2 Tiered System (separates index from data)
Data Relay Architecture Data is Partitioned across clusters C1 C1 N1 N2 N3 N4 N5 N6 N7 N8 N9 Cluster 1 Cluster 2 Cluster 3 Group A Data is Replicated within Clusters Group B Group Cluster Node
Stream Architecture C1 C1 N1 N2 N3 N4 N5 N6 N7 N8 N9 Cluster 1 Cluster 2 Cluster 3 Activities Index
Activity Stream Update New Activity Msg N1 N2 N3 N4 N5 N6 N7 N8 N9 Cluster 3 Cluster 1 Cluster 2
CCR Perspective N1 New Activity Msg Node 2 Proxy (Destination Node) Fire & Forget Msg Round Trip Msgs Port1 Port1 Arbiters Arbiters Dispatcher Queue Port2 Dispatcher Queue Thread Pool Thread Pool
Activity Stream Request Client Activity Stream Request Distributed Query - FriendList SubQuery FriendList1 SubQuery FriendList3 SubQuery – FriendList2 N1 N2 N3 N4 N5 N6 N7 N8 N9 Cluster 3 Cluster 1 Cluster 2
CCR Perspective Client Activity Stream Query Node 1 Proxy (Destination Node) Fire & Forget Msg Round Trip Msgs Port1 Port1 Arbiters Arbiters Dispatcher Queue Port2 Dispatcher Queue Thread Pool Thread Pool
Activity Stream Request Query Result Sub-Query Result3 Sub-Query Result1 Sub-Query Result2 N1 N2 N3 N4 N5 N6 N7 N8 N9 Cluster 3 Cluster 1 Cluster 2
Activity Stream Request Activity Stream Response Query Result Sub-Query Result3 Sub-Query Result1 Sub-Query Result2 N1 N2 N3 N4 N5 N6 N7 N8 N9 Cluster 2 Cluster 3 Cluster 1
Activity Stream Request Activity Stream Response Query Result Activity Index Cache C1 C1 Activities Data Cache
More Graphs Requests / Sec Stream Requests Index Gets
Index Cache • De Facto Distributed Querying Platform • Sort, Merge, Filter • Ubiquitous when Key/Value Store is not enough • Activity Stream • Videos • Music • MySpace Developer Platform
Robots Processing Your Every Move! • CCR constructs in every NodeProxy • Ports • Arbiters • Dispatcher Queues • Dispatchers (Shared) • Messages Batching • Arbiter.Choice • Arbiter.MultipleItemReceive • Arbiter.Receive from the TimeoutPort • ThreadpoolFlexibilty • Number of pools • Flexibility to set & change pool size dynamically*
Activity Stream • Activities are everywhere Twitter MySpace Google
Tony Chow Development Manager
Real-Time Stream • Pushes user activities out to subscribers using the PubSubHubbubstandard • Anyone can subscribe to the Real-Time Stream, free of charge • Launched in December 2009 • Major subscribers: Google, Groovy, OneRiot • ~100 million messages delivered per day
The Challenges • Protect the user experience • Constant stream to healthy subscribers • Give all subscribers a fair chance at trying • Prevent unhealthy subscribers from doing damage
Policing the Stream • Queue • Partition • Throttle • Async I/O
Policing the Stream • So far so good—for occasionally slow subscribers • But chronically underperforming subscribers call for more drastic measures
Policing the Stream • Discard • Unsubscribe
Transaction Manager is Everywhere @ MySpace! • Generic platform for reliable persistence • Supports SQL, SOAP, REST, and SMTP calls • MySpace Mail • Friend Requests • Status/Mood Update • And much more!
The Role of CCR • CCR is integral to DataRelay • CCR Iterator Pattern for Async I/O
Asynchronous I/O • Synchronous I/O • Needs lots of threads to do lots of I/O • Massive context switching • Doesn’t scale • Asynchronous I/O • Efficient use of threads • Massively scales • Hard to program, harder to read • Gnarly and unmaintainable code
The CCR Iterator Pattern • A better way to do write async code • C# Iterators—makes enumerators easier • CCR Iterators—makes async I/O easier • Makes async code look like sync code
The Diffference IEnumerable<ITask> After() { cmd1.BeginExecuteNonQuery(result=>port.Post(1)); yield return Arbiter.Receive(...); cmd1.EndExecuteNonQuery(); • cmd2.BeginExecuteNonQuery(result=>port.Post(1)); • yield return Arbiter.Receive(...); • cmd2.EndExecuteNonQuery(); } void Before() { cmd1.BeginExecuteNonQuery( result1=> { cmd1.EndExecuteNonQuery(); cmd2.BeginExecuteNonQuery( result2=> { cmd2.EndExecuteNonQuery(); }); }); }
The CCR Iterator Pattern • Improves readability and maintainability • Far less bug-prone • Indispensible for asynchronous programming
What Now? • We didn’t show any code samples… • Because we are going to share more than samples … WE ARE OPEN SOURCING!!
Open Source • http://DataRelay.CodePlex.com • Lesser GPL License for… • Data Relay Base • Our C#/Managed C++ Berkeley DB Wrapper and Storage Component • Index Cache System • Network transport • Serialization System
What Now? • Places in our code with CCR • Bucketed batch • \Infrastructure\DataRelay\RelayComponent.Forwarding\Node.cs - ActivateBurstReceive(int count) • Distributed bulk message handling • \Infrastructure\DataRelay\RelayComponent.Forwarding\Forwarder.cs - HandleMessages • General Message Handling • \Infrastructure\DataRelay\DataRelay.RelayNode\RelayNode.cs • \Infrastructure\SocketTransport\Server\SocketServer.cs
Evaluate Us! Please fill out an evaluation for our presentation! More evaluations = more better for everyone.
Thank You! Questions? • Erik Nelson • enelson@myspace-inc.com • Akash Patel • Apatel@myspace-inc.com • Tony Chow • Tchow@myspace-inc.com • http://DataRelay.CodePlex.com