A Hadoop Overview

A Hadoop Overview

Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

Progress • Hadoop buildup has been completed. • Version 0.19.0, running under Standalone mode. • HBase buildup has been completed. • Version 0.19.3, with no assists of HDFS. • Simple demonstration over MapReduce. • Simple word count program.

Hadoop • Full name Apache Hadoopproject. • Open source implementation for reliable, scalable distributed computing. • An aggregation of the following projects (and its core): • Avro • Chukwa • HBase • HDFS • Hive • MapReduce • Pig • ZooKeeper

Virtual Machine (VM) • Virtualization • All services are delivered through VMs. • Allows for dynamically configuring and managing. • There can be multiple VMs running on a single commodity machine. • VMware

HDFS(Hadoop Distributed File System) • The highly scalable distributed file system of Hadoop. • Resembles Google File System(GFS). • Provides reliability by replication. • NameNode & DataNode • NameNode • Maintains file system metadata and namespace. • Provides management and control services. • Usually one instance. • DataNode • Provides data storage and retrieval services. • Usually several instances.

MapReduce • The sophisticate distributed computing service of Hadoop. • A computation framework. • Usually resides on HDFS. • JobTracker & TaskTracker • JobTracker • Manages the distribution of tasks to the TaskTrackers. • Provides job monitoring and control, and the submission of jobs. • TaskTracker • Manages single map or reduce tasks on a compute node.

Cluster Makeup • A Hadoop cluster is usually make up by: • Real Machines. • Not required to be homogeneous. • Homogeneity will help maintainability. • Server Process. • Multiple process can be run on a single VM. • Master & Slave • The node/machine running the JobTracker or NameNode will be Master node. • The ones running the TaskTracker or DataNode will be Slave node.

Cluster Makeup(cont.)

Administrator Scripts • Administrator can use the following script files to start or stop server processes. • Can be located in $HADOOP_HOME/bin • Start-all.sh/stop-all.sh • Start-mapred.sh/stop-mapred.sh • Start-dfs.sh/stop-dfs.sh • Slaves.sh • hadoop

Configuration • By default, each Hadoop Core server will load the configuration from several files. • These file will be located in $HADOOP_HOME/conf • Usually identical copies of those files are maintained in every machine in the cluster.

Q & A Any question?

A Hadoop Overview

A Hadoop Overview

Presentation Transcript

Hadoop

Hadoop

Hadoop

Hadoop , Hadoop , Hadoop !!!

Hadoop

Hadoop Ecosystem Overview

Hadoop

HADOOP

Hadoop Overview

Hadoop

HDFS - Hadoop Overview 2-

Hadoop

Hadoop

A Brief Overview of Hadoop Eco-System

Big Data Overview of apache Hadoop