1 / 35

DCS 3. Key-value Stores and NoSQL

DCS 3. Key-value Stores and NoSQL. Wang Qi 2013.10.27. Outline. Why NoSQL ? Key-Value Store and NoSQL Cassandra’s internals and technologies When should use NoSQL How to shift to NoSQL from SQL(RDBMS). Why NoSQL. RDMS Data stored in tables Schema-based structured tables

taryn
Télécharger la présentation

DCS 3. Key-value Stores and NoSQL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DCS 3. Key-value Stores and NoSQL Wang Qi 2013.10.27

  2. Outline • Why NoSQL? • Key-Value Store and NoSQL • Cassandra’s internals and technologies • When should use NoSQL • How to shift to NoSQL from SQL(RDBMS)

  3. Why NoSQL • RDMS • Data stored in tables • Schema-based structured tables • Queried using SQL (Structured Query Language) • ACID

  4. Mismatch with today’s workloads • Data: Large and unstructured • Lots of random reads and writes • Foreign keys and join querys are rarely needed • Too many locks • Need • Speed(Low latency) • No Single point of failure(High availability) • Incremental Scalability • Scale out, not up: use more machines that are off the shelf (COTS), not more powerful machines

  5. CAP Theorem

  6. Key Value Store and NoSQL

  7. Cassandra • Designed and open sourced by Facebook • Features • Highly scalable and available • Eventually consistent • Distributed • Key-value store Distributed technologies from Dynamo Data model from BigTable Cassandra

  8. Cassandra Internals: Data Model • Column • Name,Value,Timestamp • Up to 2 million columns • No schemas • Variable number of columns • Variable type of value • Stored in order

  9. Cassandra Internals: Write Path • Client sends write request to one node in cluster (Coordinator) • Data Partition: Decide the node on which the data reside • Consistent Hashing • Replication strategy • Quorum • Store in data node: Commit log ->Memtables ->Respond to client • LSM-Tree(Log-Structured Merge Tree)

  10. Cassandra Internals: Write Path • Data Partition: Decide the node on which the data reside

  11. Cassandra Internals: Write Path • Client sends write request to one node in cluster (Coordinator) • Data Partition: Decide the node on which the data reside • Consistent Hashing • Replication strategy • Quorum • Store in data node: Commit log ->Memtables ->Respond to client • LSM-Tree(Log-Structured Merge Tree)

  12. Consistent hashing • partitions data based on the primary key • assigns a hash value to each primary key • Each node is responsible for a range of data based on the hash value • places the data according to the hash value and the node range

  13. Cassandra Internals: Write Path • Client sends write request to one node in cluster (Coordinator) • Data Partition: Decide the node on which the data reside • Consistent Hashing • Replication strategy • Quorum • Store in data node: Commit log ->Memtables ->Respond to client • LSM-Tree(Log-Structured Merge Tree)

  14. Replication strategy 0 N16 N112 • Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. Primary replica for key K13 N96 N32 Read/write K13 N45 N80 Coordinator (typically one per DC) Backup replicas for key K13

  15. Cassandra Internals: Write Path • Client sends write request to one node in cluster (Coordinator) • Data Partition: Decide the node on which the data reside • Consistent Hashing • Replication strategy • Quorum • Store in data node: Commit log ->Memtables ->Respond to client • LSM-Tree(Log-Structured Merge Tree)

  16. Quorum and Consistency • Quorum: way of selecting sets so that any pair of sets intersect • E.g., any arbitrary set with at least Q=N/2 +1 nodes • N = total number of replicas for this key • R = read replica count, W = write replica count • Write to any nodes If W+R > N, you have consistency, i.e., each read returns the latest written value • Cassandra’s tunable consistency: One, Quorum, All, etc.

  17. Cassandra Internals: Write Path • Client sends write request to one node in cluster (Coordinator) • Data Partition: Decide the node on which the data reside • Consistent Hashing • Replication strategy • Quorum • Store in data node: Commit log ->Memtables ->Respond to client • LSM-Tree(Log-Structured Merge Tree)

  18. Log-Structure Merge Tree B+Tree: Random Write LSM-Tree: Sequential Write

  19. Cassandra Internals: Write Path • Client sends write request to one node in cluster (Coordinator) • Data Partition: Decide the node on which the data reside • Consistent Hashing • Replication strategy • Store in data node: Commit log ->Memtables ->Respond to client • LSM-Tree(Log-Structured Merge Tree)

  20. Writes at a data node On receiving a write • 1. Log it in disk commit log (log = append-only) • 2. Make changes to appropriate memtables • In-memory representation of multiple key-value pairs • Later, when memtable is reached a threshold, flush to disk • Data File: An SSTable (Sorted String Table) – list of key value pairs, sorted by key • Index file: An SSTable of (key, position in data sstable) pairs • Compaction: Merge multiple SSTables to one. • Data updates accumulate over time will generated several SSTables • Compaction can promote the performance of reads

  21. Cassandra Internals: Read Path • Client sends write request to one node in cluster (Coordinator) • Data Partition: Similar to writes • Read in data node: Row Cache -> Memtable -> Bloom Filter -> Key Cache -> Memory index -> Disk index -> SSTable -> Respond to coordinator • Bloom Filter • Coordinator compare the result and respond to client • Read repair

  22. Bloom Filter • Compact way of representing a set of items • Checking for existence in set is cheap • Some probability of false positives: an item not in set may check true as being in set • On insert, set all hashed bits. • On check-if-present, • return true if all hashed bits set. • False positives

  23. Cassandra Internals: Read Path • Read in data node: Row Cache ->Memtable and SSTable • SSTable read path: Bloom Filter -> Key Cache -> Memory index ->Disk index ->SSTable • Respond to coordinator

  24. Cassandra Internals: Eventual Consistent • Cassandra’s consistency comes in the form of eventual. As the data is replicated, the latest version is sitting on some nodes, but older versions are still on other nodes, eventually all nodes will see the latest version. • Hinted handoff • Read Repair • Anti-Entropy

  25. Cluster Membership and Failure Detection • gossip-based cluster membership 2 1 Address generation (local) Heartbeat Version • Protocol: • Nodes periodically gossip their membership list • On receipt, the local membership list is updated • If any heartbeat older than Tfail, this node is marked as failed 4 3

  26. A Gossip Round in Cassandra • Node A generates local digest message and send it to node B. • Node B receives the message and compare to its local information. Then node B send ack message with its full newer information to node A • Node A repeats the behavior like node B after it receives the ack and send its ack message back to node B. Finally node B updates its information based on this message.

  27. Transaction in Cassandra • Atomicity • Row level atomicity • Consistency • Tunable consistency • Isolation • Row level isolation • Durability • Writes are durable through the commit log

  28. Performance Evaluation • On > 50 GB data • MySQL • Writes 300 msavg • Reads 350 msavg • Cassandra • Writes 0.12 msavg • Reads 15 msavg

  29. When should us use NoSQL • Big enough data • Nodes with high performance hardware • Live without RDBMS features • Secondary indexes • Transactions • Advanced query languages

  30. Cassandra data modeling:Don’t think of a relational DBMS • Storing values in column names • A the sorted map gives efficient key lookup and efficient scans. • The number of column keys is almost unbounded. • Model column families around query patterns • Moderate de-normalize and duplicate for read performance • The cons of normalization are magnified and there are no joins since it’s high-scale distributed. • So with a fully normalized schema, reads may perform much worse. SortedMap<RowKey, SortedMap<ColumnName, (ColumnValue, Timestamp)>>

  31. Example: ‘Like’ relationship between User & Item • Get user by user id • Get item by item id • Get all the items that a particular user likes • Get all the users who like a particular item

  32. Replica of relational model • There is no easy way to query all the items that a particular user likes or all the users who like a particular item, because there are no efficient secondary indexes.

  33. Normalized entities with de-normalized custom indexes • Title and username are de-normalized in User_By_Item and Item_By_User respectively. It’s efficient to query all the item titles liked by a given user, and all the user names who like a given item.

  34. Best Principles of Cassandra Data Modeling • Keep the column name short except you use the column name to store actual data • Because it’s stored repeatedly. • Design the data model such that operations are idempotent • Idempotent operations allow partial failures in the system, as the operations can be retried safely. • If you need transactional behavior, try to model your data such that you would only need to update a single row at once • Cassandra offers row-level atomicity.

  35. Thanks!Q&A

More Related