1 / 10

Hadoop Setup

Hadoop Setup. Prerequisite:. System: Mac OS / Linux / Cygwin on Windows Notice: 1. only works in Ubuntu will be supported by TA. You may try other environments for challenge. 2. Cygwin on Windows is not recommended, for its instability and unforeseen bugs.

Télécharger la présentation

Hadoop Setup

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hadoop Setup

  2. Prerequisite: • System: Mac OS / Linux / Cygwin on Windows • Notice: • 1. only works in Ubuntu will be supported by TA. You may try other environments for challenge. • 2. Cygwin on Windows is not recommended, for its instability and unforeseen bugs. • Java Runtime Environment, JavaTM 1.6.x recommended • ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons. Hadoop Setup

  3. Single Node Setup (Usually for debug) • Untarhadoop-*.**.*.tar.gz to your user path • About Version: • The latest stable version 1.0.1 is recommended. • edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation • edit the files to configure properties: conf/core-site.xml: <configuration> <property> <name> fs.default.name </name> <value> hdfs://localhost:9000 </value> </property> </configuration> conf/hdfs-site.xml: <configuration> <property> <name> dfs.replication </name> <value> 1 </value> </property> </configuration> conf/mapred-site.xml: <configuration> <property> <name> mapred.job.tracker </name> <value> localhost:9001 </value> </property> </configuration> Hadoop Setup

  4. Cluster Setup ( the only acceptable setup for HW) • Same steps as single node setup • Set dfs.name.dir and dfs.data.dir property in hdfs-site.xml • Add the master’s node name to conf/master • Add all the slaves’ node name to conf/slaves • Edit /etc/hosts in each node: add IP and node name item for each node • Suppose your master’s node name is ubuntu1 and its IP is 192.168.0.2, then add line “192.168.0.2 ubuntu1” to the file • Copy the folder to the same path of all nodes • Notice: JAVA_HOME may not be set the same in each node Hadoop Setup

  5. Execution • generating ssh keygen. Passphrase will be omitted when starting up:$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys$ ssh localhost • Format a new distributed-filesystem:$ bin/hadoop namenode –format • Start the hadoop daemons:$ bin/start-all.sh • The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs). Hadoop Setup

  6. Execution(continued) • Copy the input files into the distributed filesystem:$ bin/hadoop fs -put conf input • Run some of the examples provided:$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' • Examine the output files: • View the output files on the distributed filesystem:$ bin/hadoop fs -cat output/* • When you're done, stop the daemons with:$ bin/stop-all.sh Hadoop Setup

  7. Details About Configuration Files • Hadoop configuration is driven by two types of important configuration files: • Read-only default configuration:src/core/core-default.xmlsrc/hdfs/hdfs-default.xmlsrc/mapred/mapred-default.xmlconf/mapred-queues.xml.template. • Site-specific configuration:conf/core-site.xmlconf/hdfs-site.xmlconf/mapred-site.xmlconf/mapred-queues.xml Hadoop Setup

  8. Details About Configuration Files (continued) conf/core-site.xml: conf/hdfs-site.xml: Hadoop Setup

  9. Details About Configuration Files (continued) conf/mapred-site.xml: Hadoop Setup

  10. You may get detailed information from The official site: http://hadoop.apache.org Course slides & Textbooks: http://www.cs.sjtu.edu.cn/~liwujun/course/mmds.html Michael G. Noll's Blog (a good guide): http://www.michael-noll.com/ If you have good materials to share, please send them to TA. Hadoop Setup

More Related