190 likes | 298 Vues
This report provides a detailed overview of Apache Hadoop and HBase, focusing on their architecture, programming models, and cluster management. Key features of Hadoop, including MapReduce, HDFS, and the integration of various projects, are explored. The setup progress of Hadoop in standalone mode and HBase without HDFS is documented. Additionally, a simple demonstration of a MapReduce word count program illustrates Hadoop's computing capabilities. The report concludes with a Q&A session for any outstanding queries.
E N D
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Progress • Hadoop buildup has been completed. • Version 0.19.0, running under Standalone mode. • HBase buildup has been completed. • Version 0.19.3, with no assists of HDFS. • Simple demonstration over MapReduce. • Simple word count program.
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Hadoop • Full name Apache Hadoopproject. • Open source implementation for reliable, scalable distributed computing. • An aggregation of the following projects (and its core): • Avro • Chukwa • HBase • HDFS • Hive • MapReduce • Pig • ZooKeeper
Virtual Machine (VM) • Virtualization • All services are delivered through VMs. • Allows for dynamically configuring and managing. • There can be multiple VMs running on a single commodity machine. • VMware
HDFS(Hadoop Distributed File System) • The highly scalable distributed file system of Hadoop. • Resembles Google File System(GFS). • Provides reliability by replication. • NameNode & DataNode • NameNode • Maintains file system metadata and namespace. • Provides management and control services. • Usually one instance. • DataNode • Provides data storage and retrieval services. • Usually several instances.
MapReduce • The sophisticate distributed computing service of Hadoop. • A computation framework. • Usually resides on HDFS. • JobTracker & TaskTracker • JobTracker • Manages the distribution of tasks to the TaskTrackers. • Provides job monitoring and control, and the submission of jobs. • TaskTracker • Manages single map or reduce tasks on a compute node.
Cluster Makeup • A Hadoop cluster is usually make up by: • Real Machines. • Not required to be homogeneous. • Homogeneity will help maintainability. • Server Process. • Multiple process can be run on a single VM. • Master & Slave • The node/machine running the JobTracker or NameNode will be Master node. • The ones running the TaskTracker or DataNode will be Slave node.
Administrator Scripts • Administrator can use the following script files to start or stop server processes. • Can be located in $HADOOP_HOME/bin • Start-all.sh/stop-all.sh • Start-mapred.sh/stop-mapred.sh • Start-dfs.sh/stop-dfs.sh • Slaves.sh • hadoop
Configuration • By default, each Hadoop Core server will load the configuration from several files. • These file will be located in $HADOOP_HOME/conf • Usually identical copies of those files are maintained in every machine in the cluster.
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Q & A Any question?