1 / 12

Gowtham Rajappan

Gowtham Rajappan . HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google Bigtable. Master: hadoop01.cselabs.umn.edu Slaves: hadoop02 – hadoop05.cselabs.umn.edu

korene
Télécharger la présentation

Gowtham Rajappan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gowtham Rajappan

  2. HDFS – Hadoop Distributed File System modeled on Google GFS. • Hadoop MapReduce – Similar to Google MapReduce • Hbase – Similar to Google Bigtable

  3. Master: hadoop01.cselabs.umn.edu • Slaves: hadoop02 – hadoop05.cselabs.umn.edu • You will require cselabs account to access this cluster. You can login to any of these machines from any cs/cselabs machine.

  4. Data is divided into various tables • Table is composed of columns, columns are grouped into column-families

  5. Partitioning • A table is horizontally partitioned into regions, each region is composed of sequential range of keys • Each region is managed by a RegionServer, a single RegionServer may hold multiple regions • Persistence and data availability • HBase stores its data in HDFS, it doesn't replicate RegionServers and relies on HDFS replication for data availability. • Region data is cached in-memory • Updates and reads are served from in-memory cache (MemStore) • MemStore is flushed periodically to HDFS • Write Ahead Log (stored in HDFS) is used for durability of updates

  6. HBase shell provides interactive commands for manipulating database • Create/delete tables • Insert/update/read from tables • Manage regions

  7. Hbase provides single row atomic operations • CheckAndPut – Similar to test-and-set • CheckAndDelete • All row operations are atomic no matter how many columns are involved. • Hbase also provides row level exclusive locks • You can use these locks to implement single row level transactions

  8. HBase stores multiple versions of a column in a row. Each version is identified by a integer timestamp • By default system time is used as version timestamps. However user can specify a logical timestamp for versioning • Each update to a row creates a new version, for the specified column. • A version can be accessed or deleted using its timestamp. HBase allows to obtain list of all the versions.

  9. Hadoop Home - http://hadoop.apache.org/ • Hbase - http://hbase.apache.org/ • API • http://hbase.apache.org/apidocs/ • http://hadoop.apache.org/

More Related