470 likes | 591 Vues
This presentation by Gary Dusbabek at Rackspace discusses the evolution and architecture of Apache Cassandra, a distributed database designed for high scalability and availability. It covers key topics such as its historical background, scaling mechanisms, replication models, and practical data modeling approaches. The session also delves into write and read paths, client access methods, and performance tuning considerations. By examining Cassandra's unique features and its ability to handle large volumes of data efficiently, attendees will gain insights into why Cassandra is preferred in cloud computing environments.
E N D
Apache Gary Dusbabek Rackspace Silicon Valley Cloud Computing Group • 17 June 2010
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Why Cassandra? 1.98 billion 500 GB drives 988EB 6 fold growth In 4 years 322 million 500GB drives 161 EB 2006 2010 Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
SQL • Specialized data structures (think B-trees) • Shines with complicated queries • Focus on fast query & analysis quickly • Not necessarily on large datasets
Ever tried scaling a RDBMS • For reads? • Memcache etc. • For writes? • Oh noes!
Vertical Scaling Is hard credit: janetmck via flickr
No, really: Vertical Scaling Is hard
Enter Cassandra • Amazon Dynamo • Consistent hashing • Partitioning • Replication • One-hop routing • Google BigTable • Column Families • Memtables • SSTables
Origins Pre-2008
Moving Along 2008
Landed 2009
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Distributed and Scalable • Horizontal! • All nodes are identical • No master or SPOF • Adding is simple • Automatic cluster maintenance
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Replication • Replication factor • How many nodes data is replicated on • Consistency level • Zero, One, Quorum, All • Sync or async for writes • Reliability of reads • Read repair
Ring Topology RF=3 Conceptual Ring One token per node Multiple ranges per node a j d g
Ring Topology RF=2 Conceptual Ring One token per node Multiple ranges per node a j d g
New Node RF=3 Token assignment Range adjustment Bootstrap Arrival only affects immediate neighbors a m j d g
Ring Partition RF=3 Node dies Available? Hinting Handoff Achtung! Plan for this a j d g
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Schema-free Sparse-table • Flexible column naming • You define the sort order • Not required to have a specific column just because another row does
Data Model • Keyspace • ColumnFamily • Row (indexed) • Key • Columns • Name (sorted) • Value
Data Model A single column
Data Model A single row
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Eventually Consistent • CAP Theorem • Consistency • Availability • Partition Tolerance • Choose two • Cassandra chooses A and P But…
Eventually Consistent I got a fever! And the only prescription is MORE CONSISTENCY!
Tunable Consistency • Give up a little A and P to get more C • Ratchet up the consistency level • R + W > N Strong consistency • More to come
Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations
Inserting: Overview • Simple: put(key, col, value) • Complex: put(key, [col:value, …, col:value]) • Batch: multi key.
Inserting: Writes • Commit log for durability • Configurable fsync • Sequential writes only • Memtable – no disk access (no reads or seeks) • Sstables are final (become read only) • Indexes • Bloom filter • Raw data • Bottom line: FAST!!!
Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations
Querying: Overview • You need a key or keys: • Single: key=‘a’ • Range: key=‘a’ through ’f’ • And columns to retrieve: • Slice: cols={bar through kite} • By name: key=‘b’ cols={bar, cat, llama} • Nothing like SQL “WHERE col=‘faz’” • But secondary indices are being worked on (see CASSANDRA-749)
Querying: Reads • Practically lock free • Sstable proliferation • New in 0.6: • Row cache (avoid sstable lookup, not write-through) • Key cache (avoid index scan)
Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations
Client API (Low Level) • Fat Client • Live non-storage node • Reduced RPC overhead • Thrift (12 language bindings!) • http://incubator.apache.org/thrift/ • No streaming • Avro • Work in progress
Client API (High Level) • http://wiki.apache.org/cassandra/ClientOptions • Feature rich • Connection pooling • Load balancing/failover • Simplified APIs • Version opaque
Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations
Practical Considerations • Partitioner-Random or Order Preserving • Range queries • Provisioning • Virtual or bare metal • Cluster size • Data model • Think in terms of access • Giving up transactions, ad-hoc queries, arbitrary indexes and joins • (you may already do this with an RDBMS!)
Practical Considerations • Wide rows • Data life-span • Cluster planning • Bootstrapping
Future Direction • Vector clocks (server side conflict resolution) • Alter keyspace/column families on a live cluster • Compression • Multi-tenant features • Less memory restrictions
Wrapping Up • Use Cassandra if you want/need • High write throughput • Near-linear scalability • Automated replication/fault tolerance • Can tolerate missing RDBMS features
Questions? Linkage • wiki.apache.org/cassandra • cassandra.apache.org • gdusbabek@gmail.com • gdusbabek on twitter and just about everything else.