1 / 17

A Hadoop Overview

A Hadoop Overview. Outline. Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A. Outline. Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A. Progress. Hadoop buildup has been completed.

morton
Télécharger la présentation

A Hadoop Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hadoop Overview

  2. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

  3. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

  4. Progress • Hadoop buildup has been completed. • Version 0.19.0, running under Standalone mode. • HBase buildup has been completed. • Version 0.19.3, with no assists of HDFS. • Simple demonstration over MapReduce. • Simple word count program.

  5. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

  6. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

  7. Hadoop • Full name Apache Hadoopproject. • Open source implementation for reliable, scalable distributed computing. • An aggregation of the following projects (and its core): • Avro • Chukwa • HBase • HDFS • Hive • MapReduce • Pig • ZooKeeper

  8. Virtual Machine (VM) • Virtualization • All services are delivered through VMs. • Allows for dynamically configuring and managing. • There can be multiple VMs running on a single commodity machine. • VMware

  9. HDFS(Hadoop Distributed File System) • The highly scalable distributed file system of Hadoop. • Resembles Google File System(GFS). • Provides reliability by replication. • NameNode & DataNode • NameNode • Maintains file system metadata and namespace. • Provides management and control services. • Usually one instance. • DataNode • Provides data storage and retrieval services. • Usually several instances.

  10. MapReduce • The sophisticate distributed computing service of Hadoop. • A computation framework. • Usually resides on HDFS. • JobTracker & TaskTracker • JobTracker • Manages the distribution of tasks to the TaskTrackers. • Provides job monitoring and control, and the submission of jobs. • TaskTracker • Manages single map or reduce tasks on a compute node.

  11. Cluster Makeup • A Hadoop cluster is usually make up by: • Real Machines. • Not required to be homogeneous. • Homogeneity will help maintainability. • Server Process. • Multiple process can be run on a single VM. • Master & Slave • The node/machine running the JobTracker or NameNode will be Master node. • The ones running the TaskTracker or DataNode will be Slave node.

  12. Cluster Makeup(cont.)

  13. Administrator Scripts • Administrator can use the following script files to start or stop server processes. • Can be located in $HADOOP_HOME/bin • Start-all.sh/stop-all.sh • Start-mapred.sh/stop-mapred.sh • Start-dfs.sh/stop-dfs.sh • Slaves.sh • hadoop

  14. Configuration • By default, each Hadoop Core server will load the configuration from several files. • These file will be located in $HADOOP_HOME/conf • Usually identical copies of those files are maintained in every machine in the cluster.

  15. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

  16. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

  17. Q & A Any question?

More Related