Gary Dusbabek Rackspace

Apache Gary Dusbabek Rackspace Silicon Valley Cloud Computing Group • 17 June 2010

Outline • History • Scaling • Replication Model • Data Model • Tuning • Write Path • Read Path • Client Access • Practical Considerations

Why Cassandra? 1.98 billion 500 GB drives 988EB 6 fold growth In 4 years 322 million 500GB drives 161 EB 2006 2010 Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf

Why Cassandra?

SQL • Specialized data structures (think B-trees) • Shines with complicated queries • Focus on fast query & analysis quickly • Not necessarily on large datasets

Ever tried scaling a RDBMS • For reads? • Memcache etc. • For writes? • Oh noes!

Vertical Scaling Is hard credit: janetmck via flickr

No, really: Vertical Scaling Is hard

Enter Cassandra • Amazon Dynamo • Consistent hashing • Partitioning • Replication • One-hop routing • Google BigTable • Column Families • Memtables • SSTables

Origins Pre-2008

Moving Along 2008

Landed 2009

Distributed and Scalable • Horizontal! • All nodes are identical • No master or SPOF • Adding is simple • Automatic cluster maintenance

Replication • Replication factor • How many nodes data is replicated on • Consistency level • Zero, One, Quorum, All • Sync or async for writes • Reliability of reads • Read repair

Ring Topology RF=3 Conceptual Ring One token per node Multiple ranges per node a j d g

Ring Topology RF=2 Conceptual Ring One token per node Multiple ranges per node a j d g

New Node RF=3 Token assignment Range adjustment Bootstrap Arrival only affects immediate neighbors a m j d g

Ring Partition RF=3 Node dies Available? Hinting Handoff Achtung! Plan for this a j d g

Schema-free Sparse-table • Flexible column naming • You define the sort order • Not required to have a specific column just because another row does

Data Model • Keyspace • ColumnFamily • Row (indexed) • Key • Columns • Name (sorted) • Value

Easier to show from the bottom up

Data Model A single column

Data Model A single row

Data Model

Eventually Consistent • CAP Theorem • Consistency • Availability • Partition Tolerance • Choose two • Cassandra chooses A and P But…

Eventually Consistent I got a fever! And the only prescription is MORE CONSISTENCY!

Tunable Consistency • Give up a little A and P to get more C • Ratchet up the consistency level • R + W > N  Strong consistency • More to come

Inserting: Overview • Simple: put(key, col, value) • Complex: put(key, [col:value, …, col:value]) • Batch: multi key.

Inserting: Writes • Commit log for durability • Configurable fsync • Sequential writes only • Memtable – no disk access (no reads or seeks) • Sstables are final (become read only) • Indexes • Bloom filter • Raw data • Bottom line: FAST!!!

Outline • History • Scaling • Replication Model • Data Model • Tuning • WritePath • Read Path • Client Access • Practical Considerations

Querying: Overview • You need a key or keys: • Single: key=‘a’ • Range: key=‘a’ through ’f’ • And columns to retrieve: • Slice: cols={bar through kite} • By name: key=‘b’ cols={bar, cat, llama} • Nothing like SQL “WHERE col=‘faz’” • But secondary indices are being worked on (see CASSANDRA-749)

Querying: Reads • Practically lock free • Sstable proliferation • New in 0.6: • Row cache (avoid sstable lookup, not write-through) • Key cache (avoid index scan)

Client API (Low Level) • Fat Client • Live non-storage node • Reduced RPC overhead • Thrift (12 language bindings!) • http://incubator.apache.org/thrift/ • No streaming • Avro • Work in progress

Client API (High Level) • http://wiki.apache.org/cassandra/ClientOptions • Feature rich • Connection pooling • Load balancing/failover • Simplified APIs • Version opaque

Practical Considerations • Partitioner-Random or Order Preserving • Range queries • Provisioning • Virtual or bare metal • Cluster size • Data model • Think in terms of access • Giving up transactions, ad-hoc queries, arbitrary indexes and joins • (you may already do this with an RDBMS!)

Practical Considerations • Wide rows • Data life-span • Cluster planning • Bootstrapping

Future Direction • Vector clocks (server side conflict resolution) • Alter keyspace/column families on a live cluster • Compression • Multi-tenant features • Less memory restrictions

Wrapping Up • Use Cassandra if you want/need • High write throughput • Near-linear scalability • Automated replication/fault tolerance • Can tolerate missing RDBMS features

Questions? Linkage • wiki.apache.org/cassandra • cassandra.apache.org • gdusbabek@gmail.com • gdusbabek on twitter and just about everything else.

Gary Dusbabek Rackspace

Gary Dusbabek Rackspace

Presentation Transcript

GARY LINEKER

Gary Guion

Gary Shoats

Gary Sazer

Understanding Rackspace APIs

Gary Allan

Gary Paulsen

Gary Ridgeway

GARY PAULSEN

Public Clouds (EC2, Azure, Rackspace, …)

Gary Suter

gary

Gary

Gary Johnson

Gary Clark

Rackspace mail chenge password | Rackspace ctechnical suport number

To Synchronization Rackspace email with Microsoft Outlook.

rackspace server management

Danko Gary