250 likes | 371 Vues
Amazon’s Dynamo. Simple Cloud Storage. Foundations. 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks” Idea of tabular data SQL Foundations Codd’s 12 rules How database structured and what is available to user
E N D
Amazon’s Dynamo Simple Cloud Storage
Foundations • 1970 – E.F. Codd“A Relational Model of Data for Large Shared Data Banks” • Idea of tabular data • SQL Foundations • Codd’s 12 rules • How database structured and what is available to user • Application not dependent on physical or logical levels of database • Insert, Update, Delete operators
Foundations, continued • First Relational database management systems (RDBMS) • Oracle in 1979, first SQL based system • Microsoft SQL server, etc • Open source software would follow later (mySQL) • Follows Codd’s ideas • Complexity on the server side, let the query do all the work • Very stringent requirements
Drawbacks • Licensing fees on a per processor rate • High end Oracle system is mind-numbingly expensive • Load Distribution requires specific nodes to handle • Some servers have specific roles, failure point in network • Complexity on servers creates difficultly with maintenance, upgrades • Upgrades often all at once as result, not incremental
A New Direction • Simplify services that the database provides • Easier scaling and error handling • Take a more pragmatic approach • Tailor system to sacrifice some aspects of the traditional RDBMS to gain performance in others • Systems less general, specific end requirements in mind when creating
Examples • Amazon Dynamo • Simple primary key • Highly available, end user based model • Low cost virtualized nodes • Facebook Cassandra • Similar goals to Amazon’s Dynamo • Highly avaiable, incremental scalablilty • Google File System • Master node • Data distributed across low cost nodes
Dynamo Goals • Scale – adding systems to network causes minimal impact • Symmetry – No special roles, all features in all nodes • Decentralization – No Master node(s) • Highly Available – Focus on end user experience • SPEED – A system can only be as fast as the lowest level • Service Level Agreements – System can be adapted to an application’s specific needs, allows flexibility
Dynamo Assumptions • Query Model – Simple interface exposed to application level • Get(), Put() • No Delete() • Atomicity, Consistency, Isolation, Durability • Operations either succeed or fail, no middle ground • System will be eventually consistent, no sacrifice of availability to assure consistency • Conflicts can occur while updates propagate through system • System can still function while entire sections of network are down • Efficiency – Measure system by the 99.9th percentile • Important with millions of users, 0.1% can be in the 10,000s • Non Hostile Environment - No need to authenticate query, speed boost
Wanted Results • Deliver requests in a bounded time • Always writable • Highly available to users • No dedicated roles • Work split between nodes fairly
Partitioning • Consistent Hashing • Changing the number of slots in hash table results in only a small number of keys to remap • More info • A ring of virtual nodes • Node responsible for region between it and its predecessor
Virtual Node • Physical Machine has # of virtual nodes based on performance • Can adapt load more easily if a machine goes down • Likewise, assign nodes to a new machine in network
Replication • Application provided parameter N • Replication on different physical nodes • Data still available if nodes go down • Makes part of preference list for query
Versioning and Vector Clocks • Updates propagate asynchronously, need a way of distinguishing conflicts • Possible reason for absence of Delete() • Vector Clock • List of (node, counter) • Limited size, limit overhead for data • If all fields are less than or equal, first can be updated by second
Sloppy Quorum and Hinted Handoff • W and R parameter set min # of nodes in a read or write • Read and write on the first N healthy nodes, no strict membership, can vary over time • Hint in metadata for intended node, will update once that node is again available • Allows for temporary failure in nodes or entire networks
Synchronization and Gossip • Merkle Trees - Info • Use common key values between two nodes • Traverse tree and check vector clocks to see if updates are needed • Exchange information on most current version of the data if inconsistencies are found • Gossip • Nodes select neighbors at random and reconcile membership change histories • Use seed nodes to initialize • Detect failures
Routing get() and put() • Two Techniques • Route request through a load balancer • Slower • Simpler application level code • Partition aware client, route directly to appropriate nodes • Faster • More complicated application level • First node routed to is “coordinator” node • Generates vector clock for put and gives data to N highest healthy nodes • Queries N highest nodes for all versions, returned all versions found
Implementation • Java based • Hardware independent, JVM • Allows different back-end systems to be used, based on size of data needed to be stored • Berkeley Database Transactional Data Store • BDB Java Edition • MySQL, can handle large objects • Coordinator node is a state machine for read/writes for client • Coordinator for a write determined by fastest read
Flexibility • Changing W, R, N • Business logic specific reconciliation • Data replicated over nodes • Application level reconciliation fro conflicting objects • Timestamp Reconciliation • Similar to above, last write wins • High performance read engine • By setting R = 1, W = N • Reads fast and numerous, few updates
Observed Results – Load Balancing • Higher traffic causes load to be balanced more evenly • Requests of popular keys let system to balance more easily • In lower traffic, less important to balance load
Observed Results - Coordination • Client coordination can provide a speed boost • Read and write latency nearly identical • Results as expected
Observed Results - Versions • Measured over 24 hour period for shopping cart • 99.94% of users saw 1 version • 0.00057% saw 2 versions • 0.00047% saw 3 versions • 0.00009% saw 4 versions • Increase caused by increase in number of concurrent writers, most likely
Conclusions • Dynamo allows Amazon’s customers to have a consistent experience even in face of server and network errors • Gives a scalable solution with millions of data points to be queried quickly and efficiently • Offloads complexity to the application to provide a simple, flexible, and fast server-side implementation