Hadoop Setup
100 likes | 239 Vues
Hadoop Setup. Prerequisite:. System: Mac OS / Linux / Cygwin on Windows Notice: 1. only works in Ubuntu will be supported by TA. You may try other environments for challenge. 2. Cygwin on Windows is not recommended, for its instability and unforeseen bugs.
Hadoop Setup
E N D
Presentation Transcript
Prerequisite: • System: Mac OS / Linux / Cygwin on Windows • Notice: • 1. only works in Ubuntu will be supported by TA. You may try other environments for challenge. • 2. Cygwin on Windows is not recommended, for its instability and unforeseen bugs. • Java Runtime Environment, JavaTM 1.6.x recommended • ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons. Hadoop Setup
Single Node Setup (Usually for debug) • Untarhadoop-*.**.*.tar.gz to your user path • About Version: • The latest stable version 1.0.1 is recommended. • edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation • edit the files to configure properties: conf/core-site.xml: <configuration> <property> <name> fs.default.name </name> <value> hdfs://localhost:9000 </value> </property> </configuration> conf/hdfs-site.xml: <configuration> <property> <name> dfs.replication </name> <value> 1 </value> </property> </configuration> conf/mapred-site.xml: <configuration> <property> <name> mapred.job.tracker </name> <value> localhost:9001 </value> </property> </configuration> Hadoop Setup
Cluster Setup ( the only acceptable setup for HW) • Same steps as single node setup • Set dfs.name.dir and dfs.data.dir property in hdfs-site.xml • Add the master’s node name to conf/master • Add all the slaves’ node name to conf/slaves • Edit /etc/hosts in each node: add IP and node name item for each node • Suppose your master’s node name is ubuntu1 and its IP is 192.168.0.2, then add line “192.168.0.2 ubuntu1” to the file • Copy the folder to the same path of all nodes • Notice: JAVA_HOME may not be set the same in each node Hadoop Setup
Execution • generating ssh keygen. Passphrase will be omitted when starting up:$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys$ ssh localhost • Format a new distributed-filesystem:$ bin/hadoop namenode –format • Start the hadoop daemons:$ bin/start-all.sh • The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs). Hadoop Setup
Execution(continued) • Copy the input files into the distributed filesystem:$ bin/hadoop fs -put conf input • Run some of the examples provided:$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' • Examine the output files: • View the output files on the distributed filesystem:$ bin/hadoop fs -cat output/* • When you're done, stop the daemons with:$ bin/stop-all.sh Hadoop Setup
Details About Configuration Files • Hadoop configuration is driven by two types of important configuration files: • Read-only default configuration:src/core/core-default.xmlsrc/hdfs/hdfs-default.xmlsrc/mapred/mapred-default.xmlconf/mapred-queues.xml.template. • Site-specific configuration:conf/core-site.xmlconf/hdfs-site.xmlconf/mapred-site.xmlconf/mapred-queues.xml Hadoop Setup
Details About Configuration Files (continued) conf/core-site.xml: conf/hdfs-site.xml: Hadoop Setup
Details About Configuration Files (continued) conf/mapred-site.xml: Hadoop Setup
You may get detailed information from The official site: http://hadoop.apache.org Course slides & Textbooks: http://www.cs.sjtu.edu.cn/~liwujun/course/mmds.html Michael G. Noll's Blog (a good guide): http://www.michael-noll.com/ If you have good materials to share, please send them to TA. Hadoop Setup