1 / 9

Cassandra Database Project

Cassandra Database Project . News Presentation: Joab Jackson, “New Cassandra Can Pack Two Billion Columns Into a Row ” , PCWorld News, January 2011 . . Alireza Haghdoost , Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov . 17, 2011.

bambi
Télécharger la présentation

Cassandra Database Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cassandra Database Project News Presentation: Joab Jackson, “New Cassandra Can Pack Two Billion Columns Into a Row” , PCWorld News, January 2011. Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011

  2. What was the Problem ? • Facebook Messages Inbox Search • Feature that enables users to search through their Facebook Inbox • Millions of messages are sent everyday on Facebook • Messages stored in different data centers • How to handle indexing all of this information for Inbox search ?

  3. What is Cassandra ? • Distributed storage system • Designed for managing kind of NoSQL database • NoSQL: Key-Value, schema-less database • Scale to a very large size across many servers • spread across different datacenters • small and large components fail continuously • No single point of failure • Data replicated at several nodes

  4. Cassandra Goals • High scalability • The ability to scale incrementally • High performance • The ability to respond quickly • High availability • The ability to retain data available for users

  5. Cassandra Data Model • Cassandra does not support a full relational data model • Key-Value data model • Every row is identified by a unique key • Every row can have unlimited number of Columns • classified in different columns family • can pack Two Billioncolumns into a row • Columns are sorted in a row by • name order • time order (required for inbox search)

  6. Distributionand Replication • Data is distributed across the nodes using Consistent Hashing function • High availability is achieved using replication • If one storage node fails, data that has been replicated in other nodes is available. • Data replicate at N node across data centersactively. • Replication policies: • Rack Unaware • Rack Aware • Datacenter Aware

  7. Users of Cassandra System • First deployment: • 2008 by Facebook, inspired by Google and Amazon • Designed for message inbox search system • Stores TB’s of indexes across a cluster of 600+ cores and 120+ TB of disk space • Each node can handle over 5,000 requests per second • Well-known users:

  8. References • PrashantMalik, “Inbox Search” http://ja-jp.facebook.com/blog.php?post=20387467130 • Joab Jackson, “Apache Cassandra Ready for the Enterprise” , http://www.pcworld.com/businesscenter/article/242111/apache_cassandra_ready_for_the_enterprise.html#tk.mod_rel • Joab Jackson “ , New Cassandra Can Pack Two Billion Columns Into a Row http://www.pcworld.com/businesscenter/article/216766/new_cassandra_can_pack_two_billion_columns_into_a_row.html” • AvinashLakshman and Prashant Malik. “Cassandra: a decentralized structured storage system”SIGOPS Oper. Syst. Rev. 44, 2 (April 2010) http://doi.acm.org/10.1145/1773912.1773922

  9. Thank You

More Related