Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Gowtham Rajappan PowerPoint Presentation
Download Presentation
Gowtham Rajappan

Gowtham Rajappan

111 Vues Download Presentation
Télécharger la présentation

Gowtham Rajappan

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Gowtham Rajappan

  2. HDFS – Hadoop Distributed File System modeled on Google GFS. • Hadoop MapReduce – Similar to Google MapReduce • Hbase – Similar to Google Bigtable

  3. Master: hadoop01.cselabs.umn.edu • Slaves: hadoop02 – hadoop05.cselabs.umn.edu • You will require cselabs account to access this cluster. You can login to any of these machines from any cs/cselabs machine.

  4. Data is divided into various tables • Table is composed of columns, columns are grouped into column-families

  5. Partitioning • A table is horizontally partitioned into regions, each region is composed of sequential range of keys • Each region is managed by a RegionServer, a single RegionServer may hold multiple regions • Persistence and data availability • HBase stores its data in HDFS, it doesn't replicate RegionServers and relies on HDFS replication for data availability. • Region data is cached in-memory • Updates and reads are served from in-memory cache (MemStore) • MemStore is flushed periodically to HDFS • Write Ahead Log (stored in HDFS) is used for durability of updates

  6. HBase shell provides interactive commands for manipulating database • Create/delete tables • Insert/update/read from tables • Manage regions

  7. Hbase provides single row atomic operations • CheckAndPut – Similar to test-and-set • CheckAndDelete • All row operations are atomic no matter how many columns are involved. • Hbase also provides row level exclusive locks • You can use these locks to implement single row level transactions

  8. HBase stores multiple versions of a column in a row. Each version is identified by a integer timestamp • By default system time is used as version timestamps. However user can specify a logical timestamp for versioning • Each update to a row creates a new version, for the specified column. • A version can be accessed or deleted using its timestamp. HBase allows to obtain list of all the versions.

  9. Hadoop Home - http://hadoop.apache.org/ • Hbase - http://hbase.apache.org/ • API • http://hbase.apache.org/apidocs/ • http://hadoop.apache.org/