管理 Hadoop 服务的命令
- start-dfs.sh – 启动 Hadoop DFS 守护进程、namenode 和 datanode。在 start-mapred.sh 之前使用它
- stop-dfs.sh – 停止 Hadoop DFS 守护进程。
- start-mapred.sh – 启动 Hadoop Map/Reduce 守护进程、jobtracker 和 tasktracker。
- stop-mapred.sh – 停止 Hadoop Map/Reduce 守护进程。
- start-all.sh – 启动所有 Hadoop 守护进程、名称节点、数据节点、作业跟踪器和任务跟踪器。已弃用;使用 start-dfs.sh 然后使用 start-mapred.sh
- stop-all.sh – 停止所有 Hadoop 守护进程。已弃用;使用 stop-mapred.sh 然后 stop-dfs.sh
Hadoop 是一种技术,可以跨集群对大量数据集进行分布式处理,范围从一台服务器到数千台服务器,确保高度的容错。
Hadoop 是一个框架,由以下基本模块组成:
Hadoop Common - 包含其他 Hadoop 模块所需的库和实用程序
Hadoop 分布式文件系统 (HDFS) – 一种将数据存储在商用机器上的分布式文件系统,在整个集群中提供非常高的聚合带宽。
Hadoop YARN——一个资源管理平台,负责管理集群中的计算资源并使用它们来调度用户的应用程序。
Hadoop MapReduce——一种用于大规模数据处理的编程模型。
在本教程中,我们将在 Ubuntu(12.10/13.04/13.10)上安装和配置 Hadoop。
请按照以下步骤操作:
第 1 步:更新机器。
root@hadoop1:~# apt-get update
安装 python-software-properties 模块:
root@hadoop1:~# apt-get install python-software-properties
添加java存储库:
root@hadoop1:~# add-apt-repository ppa:webupd8team/java root@hadoop1:~# apt-get update && sudo apt-get upgrade root@hadoop1:~# apt-get install oracle-java6-installer
检查安装的java版本:
root@hadoop1:~# java -version java version "1.6.0_45" Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
如果从 update-java-alternatives 命令中看到有 2 个版本的 java。
然后运行以下命令将此java设置为最新版本:
root@hadoop1:~# update-alternatives --config java
链接组 java 中只有一种选择:/usr/lib/jvm/java-6-oracle/jre/bin/java。
这里不需要配置。
第 2 步:向系统添加一个组。
root@hadoop1:~# addgroup hadoopgroup
将 hadoop 用户添加到之前创建的组中:
root@hadoop1:~# adduser --ingroup hadoopgroup hadoopuser Adding user `hadoopuser' ... Adding new user `hadoopuser' (1001) with group `hadoopgroup' ... Creating home directory `/home/hadoopuser' ... Copying files from `/etc/skel' ... Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully Changing the user information for hadoopuser Enter the new value, or press ENTER for the default Full Name []: HADOOP USER Room Number []: Work Phone []: Home Phone []: Other []: Is the information correct? [Y/n] Y
第 3 步:创建无密码身份验证。
root@hadoop1:~# su - hadoopuser hadoopuser@hadoop1:~$ ssh-keygen -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoopuser/.ssh/id_rsa): Created directory '/home/hadoopuser/.ssh'. Your identification has been saved in /home/hadoopuser/.ssh/id_rsa. Your public key has been saved in /home/hadoopuser/.ssh/id_rsa.pub. The key fingerprint is: 82:a0:cb:f4:fa:1f:ac:f5:29:54:34:e7:56:ee:b0:9f hadoopuser@hadoop1.example.com The key's randomart image is: +--[ RSA 2048]----+ | | | o . . | | . . + o | | . . . . + . | |.. . o S + | |o.. .. . . . | |.. ..+ . . | | . o.o . E | | ..o...o | +-----------------+
hadoopuser@hadoop1:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys hadoopuser@hadoop1:~$ ssh localhost
登录后我们将收到以下消息:
The authenticity of host 'localhost (127.0.0.1)' can't be established. ECDSA key fingerprint is ee:be:18:ef:e6:3d:e3:8d:8a:17:ca:d1:a3:d6:d6:49. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts. Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.5.0-23-generic x86_64)
第 4 步:禁用 ipv6.
作为 root 添加文件 /etc/sysctl.conf 并添加以下行:
root@hadoop1:~# vi /etc/sysctl.conf # disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
保存并退出。
root@hadoop1:~# sysctl -p net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
要验证已禁用 ipv6:
root@hadoop1:~# cat /proc/sys/net/ipv6/conf/all/disable_ipv6 1
第 5 步:添加 hadoop 存储库。
root@hadoop1:~# add-apt-repository ppa:hadoop-ubuntu/stable
更新和升级。
root@hadoop1:~# apt-get update && apt-get upgrade
第 6 步:现在安装 Hadoop。
root@hadoop1:~# apt-get install hadoop
验证 hadoopuser 详细信息:
root@hadoop1:~# id hadoopuser uid=1001(hadoopuser) gid=1002(hadoopgroup) groups=1002(hadoopgroup)
将 hadoop 用户添加到 sudo 文件中,使其具有根级权限:
root@hadoop1:~# visudo
并添加以下行:
hadoopuser ALL=(ALL:ALL) ALL
在hadoop用户.bashrc文件中设置环境如下:
root@hadoop1:~# vi /home/hadoopuser/.bashrc
并添加以下行:
# Set Hadoop-related environment variables export HADOOP_HOME=/home/hadoopuser/hadoop # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)` export JAVA_HOME=/usr/lib/jvm/java-6-oracle/ # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:/usr/lib/hadoop/bin/ #Set some aliased unalias fs &> /dev/null alias fs="hadoop fs" unalias hls &> /dev/null alias hls="fs -ls"
第 7 步:现在配置 Hadoop。
root@hadoop1:~# chown -R hadoopuser:hadoopgroup /var/log/hadoop/ root@hadoop1:~# chmod -R 755 /var/log/hadoop/ root@hadoop1:~# cd /usr/lib/hadoop/conf/ root@hadoop1:/usr/lib/hadoop/conf# ls -ltr total 76 -rw-r--r-- 1 root hadoop 382 Mar 24 2012 taskcontroller.cfg -rw-r--r-- 1 root hadoop 1195 Mar 24 2012 ssl-server.xml.example -rw-r--r-- 1 root hadoop 1243 Mar 24 2012 ssl-client.xml.example -rw-r--r-- 1 root hadoop 10 Mar 24 2012 slaves -rw-r--r-- 1 root hadoop 10 Mar 24 2012 masters -rw-r--r-- 1 root hadoop 178 Mar 24 2012 mapred-site.xml -rw-r--r-- 1 root hadoop 2033 Mar 24 2012 mapred-queue-acls.xml -rw-r--r-- 1 root hadoop 4441 Mar 24 2012 log4j.properties -rw-r--r-- 1 root hadoop 178 Mar 24 2012 hdfs-site.xml -rw-r--r-- 1 root hadoop 4644 Mar 24 2012 hadoop-policy.xml -rw-r--r-- 1 root hadoop 1488 Mar 24 2012 hadoop-metrics2.properties -rw-r--r-- 1 root hadoop 2237 Mar 24 2012 hadoop-env.sh -rw-r--r-- 1 root hadoop 327 Mar 24 2012 fair-scheduler.xml -rw-r--r-- 1 root hadoop 178 Mar 24 2012 core-site.xml -rw-r--r-- 1 root hadoop 535 Mar 24 2012 configuration.xsl -rw-r--r-- 1 root hadoop 7457 Mar 24 2012 capacity-scheduler.xml root@hadoop1:/usr/lib/hadoop/conf#
但是在我们开始使用它们之前,我们需要修改 /conf 文件夹中的几个文件。
hadoop-env.sh
用下面的行替换 JAVA_HOME 行。
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
核心站点.xml
为临时文件创建一个基础。
root@hadoop1:/usr/lib/hadoop/conf# mkdir /home/hadoopuser/tmp root@hadoop1:/usr/lib/hadoop/conf# chown hadoopuser:hadoopgroup /home/hadoopuser/tmp/ root@hadoop1:/usr/lib/hadoop/conf# chmod 755 /home/hadoopuser/tmp/ root@hadoop1:/usr/lib/hadoop/conf#
将原始内容替换为:
root@hadoop1:/usr/lib/hadoop/conf# cat core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoopuser/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration>
root@hadoop1:/usr/lib/hadoop/conf# cat mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration>
root@hadoop1:/usr/lib/hadoop/conf# cat hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration>
root@hadoop1:/usr/lib# chown -R hadoopuser:hadoopgroup /usr/lib/hadoop/ root@hadoop1:/usr/lib# ls -ltr hadoop/ total 16 lrwxrwxrwx 1 hadoopuser hadoopgroup 15 Apr 24 2012 pids -> /var/run/hadoop lrwxrwxrwx 1 hadoopuser hadoopgroup 15 Apr 24 2012 logs -> /var/log/hadoop lrwxrwxrwx 1 hadoopuser hadoopgroup 41 Apr 24 2012 hadoop-tools-1.0.2.jar -> ../../share/hadoop/hadoop-tools-1.0.2.jar lrwxrwxrwx 1 hadoopuser hadoopgroup 40 Apr 24 2012 hadoop-test-1.0.2.jar -> ../../share/hadoop/hadoop-test-1.0.2.jar lrwxrwxrwx 1 hadoopuser hadoopgroup 44 Apr 24 2012 hadoop-examples-1.0.2.jar -> ../../share/hadoop/hadoop-examples-1.0.2.jar lrwxrwxrwx 1 hadoopuser hadoopgroup 40 Apr 24 2012 hadoop-core.jar -> ../../share/hadoop/hadoop-core-1.0.2.jar lrwxrwxrwx 1 hadoopuser hadoopgroup 40 Apr 24 2012 hadoop-core-1.0.2.jar -> ../../share/hadoop/hadoop-core-1.0.2.jar lrwxrwxrwx 1 hadoopuser hadoopgroup 39 Apr 24 2012 hadoop-ant-1.0.2.jar -> ../../share/hadoop/hadoop-ant-1.0.2.jar lrwxrwxrwx 1 hadoopuser hadoopgroup 26 Apr 24 2012 contrib -> ../../share/hadoop/contrib lrwxrwxrwx 1 hadoopuser hadoopgroup 16 Apr 24 2012 conf -> /etc/hadoop/conf drwxr-xr-x 9 hadoopuser hadoopgroup 4096 Dec 15 05:16 webapps drwxr-xr-x 2 hadoopuser hadoopgroup 4096 Dec 15 05:16 libexec drwxr-xr-x 2 hadoopuser hadoopgroup 4096 Dec 15 05:16 bin drwxr-xr-x 3 hadoopuser hadoopgroup 4096 Dec 15 05:16 lib root@hadoop1:/usr/lib# root@hadoop1:/etc/hadoop/conf# chown -R hadoopuser:hadoopgroup /etc/hadoop/ root@hadoop1:/etc/hadoop/conf# root@hadoop1:~# su - hadoopuser hadoopuser@hadoop1:~$ mkdir hadoop
格式化hadoop文件系统
要形成 hadoop FS:
root@hadoop1:/usr/lib/hadoop/conf# su - hadoopuser hadoopuser@hadoop1:~$ hadoop namenode -format 13/12/15 19:53:16 INFO namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = java.net.UnknownHostException: hadoop1.example.com: hadoop1.example.com STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012 / 13/12/15 19:53:17 INFO util.GSet: VM type = 64-bit 13/12/15 19:53:17 INFO util.GSet: 2% max memory = 19.33375 MB 13/12/15 19:53:17 INFO util.GSet: capacity = 2^21 = 2097152 entries 13/12/15 19:53:17 INFO util.GSet: recommended=2097152, actual=2097152 13/12/15 19:53:37 INFO namenode.FSNamesystem: fsOwner=hadoopuser 13/12/15 19:53:37 INFO namenode.FSNamesystem: supergroup=supergroup 13/12/15 19:53:37 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/12/15 19:53:37 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/12/15 19:53:37 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/12/15 19:53:37 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/12/15 19:53:58 INFO common.Storage: Image file of size 116 saved in 0 seconds. 13/12/15 19:53:58 INFO common.Storage: Storage directory /home/hadoopuser/tmp/dfs/name has been successfully formatted. 13/12/15 19:53:58 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: hadoop1.example.com: hadoop1.example.com /
使用 start-all.sh 启动 Hadoop 集群:
hadoopuser@hadoop1:~$ start-all.sh Warning: $HADOOP_HOME is deprecated. starting namenode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-namenode-hadoop1.example.com.out localhost: starting datanode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-datanode-hadoop1.example.com.out localhost: starting secondarynamenode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-secondarynamenode-hadoop1.example.com.out starting jobtracker, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-jobtracker-hadoop1.example.com.out localhost: starting tasktracker, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-tasktracker-hadoop1.example.com.out hadoopuser@hadoop1:~$
或者运行:
start-dfs.sh start-mapred.sh
要检查 hadoop 是否正在运行,请使用以下命令:
hadoopuser@hadoop1:~$ jps 35401 NameNode 35710 JobTracker 35627 SecondaryNameNode 35928 Jps hadoopuser@hadoop1:~$