单机上通过虚拟机搭建Hadoop环境(以下过程软件版本不固定,只能相互兼容就可以)1.安装vmware workstation(12.1.0)2.创建3个虚拟机(每个虚拟机512M内存,8G空间),每个虚拟机安装Ubuntu 桌面版(11.10)(虚拟机需安装vmare tools,上网上式可设置为桥接方式:直接连接物理网络)。
一个虚拟机作为name node(master),两个虚拟机作为data node 1(slave1)和data node 2(slave2). 三个虚拟机可以共享一个主机目录share,该目录在虚拟机上显示为/mnt/hgfs/share对每个虚拟机:修改root密码(sudo passwd root),然后用root用户登录。
3.通过如下配置master, slave1, slave2。
1)通过vi /etc/hostname修改主机名。
2)通过vi /etc/hosts修改/etc/hosts 文件,增加三台机器的ip和hostname的映射关系.在/usr/src目录:4.下载zlib(1.2.8)并解压安装5.下载openssl(1.0.1)并解压安装6.下载openssh(6.0p1)并解压安装安装后运行命令ssh localhost,如果显示” Privilege separation user sshd does not exist”,解决方法:修改/etc/passwd,在其中加入sshd:x:74:74:Privilege-seperated SSH:/var/empty/sshd:/nologin7.配置ssh无密码登录在命令行输入:vi /etc/profile.d/hadoop.sh在hadoop.sh加入以下命令:sudo ufw disable/usr/local/sbin/sshd在name node:ssh-keygen –t dsa -P ‘’ –f ~/.ssh/id_dsa在data node 1:ssh-keygen –t dsa -P ‘’ –f ~/.ssh/id_dsa在data node 2:ssh-keygen –t dsa -P ‘’ –f ~/.ssh/id_dsa在name node:scp ~/.ssh/id_dsa.pub /mnt/hgfs/share/ id_dsa0.pub在data node 1:scp ~/.ssh/id_dsa.pub /mnt/hgfs/share/ id_dsa1.pub在data node 2:scp ~/.ssh/id_dsa.pub /mnt/hgfs/share/ id_dsa2.pub分别在name node,data node 1和data node 2执行:cat /mnt/hgfs/share/id_dsa0.pub >> ~/.ssh/authorized_keyscat /mnt/hgfs/share/id_dsa1.pub >> ~/.ssh/authorized_keyscat /mnt/hgfs/share/id_dsa2.pub >> ~/.ssh/authorized_keys测试:ssh localhostssh 目标机器地址8.安装jdk对每个虚拟机:下载jdk(jdk1.6.0)并解压,假定目录为/usr/src/jdk1.6.0设置环境变量。
#vi /etc/profile在最后面加入#set java environmentexport JAVA_HOME=/usr/src/jdk1.6.0export CLASSPATH=.:$JAVA_HOME/lib.tools.jarexport PATH=$JAVA_HOME/bin:$PATH保存退出, 运行 source /etc/profile9.安装hadoop在usr/src目录解压:tar zxvf hadoop-0.20.2.tar.gz把hadoop的安装路径添加到/etc/profile中:Export HADOOP_HOME=/usr/src/hadoop-0.20.2Export PATH=$HADOOP_HOME/bin:$PATH9. 配置hadoop在hadoop-0.20.1/conf目录:(1)对3个虚拟机,在conf/hadoop-env.sh中配置java环境Vi hadoop-env.sh在hadoop-env.sh文件中添加export JAVA_HOME=/usr/src/jdk1.6.0(2) 对name node (假定name node内部地址为162.105.76.231,data node 1和data node 2分别为162.105.76.220,162.105.76.234 ) 配置conf/masters和conf/slaves文件。
Masters 文件内容: masterslaves 文件内容: slave1slave2(3)对name node, data node 1和data node2,配置conf/core-site.xml, conf/hdfs-site.xml及conf/mapred-site.xml,配置文件内的IP地址均配置为name node的IP地址(比如name node IP地址为162.105.76.231)。
conf/core-site.xml:<configuration><property><name>hadoop.tmp.dir</name><value>/home/fyj/tmp</value>//Hadoop临时文件存储目录,自行设置适当目录<description>A base for other temporary files</description><final>true</final></property><property><name></name><value>hdfs://master:9000</value></property></configuration>conf/hdfs-site.xml:<configuration><property><name>dfs.replication</name><value>1</value></property></configuration>conf/mapred-site.xml:<configuration><property><name>mapred.job.tracker</name><value>master:9001</value></property></configuration>10.运行hadoop对name node:进入hadoop-0.20.2/bin,首先格式化文件系统:hadoop namenode –formatsudo ufw disable对data node1 和data node 2:进入hadoop-0.20.2/bin,执行:hadoop datanode –format对name node:,在bin目录下执行:start-all.sh对name node , datanode1, datanode2上执行:sudo ufw disable对name node:hadoop dfsadmin –safemode leave在namenode上运行: hadoop dfsadmin –report, 查看节点情况,看到类似界面显示available的Datanodes 。
用jps命令查看进程,NameNode上的结果为:26745 JobTracker29398 jps27664 NameNodeData Node 2 的结果:5155 JobTracker6718 TaskTracker6042 DataNode6750 jpsData Node 1 的结果:12173 JobTracker10760 DataNode12700 jps在namenode上运行: hadoop dfsadmin –report, 查看节点情况11. 运行wordcount.新建wordcount.java文件;在name node上:(1)然后建立输入文件fileecho “Hello World Bye World Hello Hadoop Goodbye Hadoop ”>file(2)在hdfs中建立一个input目录:hadoop fs mkdir input(3)将file拷贝到hdfs中:Hadoop fs –copyFromlocal /usr/src/hadoop-0.20.2/file input(3) 将wordcount.java拷贝至当前目录mkdir FirstJarjavac - classpath ~/hadoop/hadoop-0.20.2-core.jar -d FirstJar wordcount. java jar -cvf wordcount.jar -C FirstJar/ .(4)执行wordcountHadoop jar wordcount.jar WordCount input output(5)查看结果:Hadoop fs –cat output/part-r-00000可得到:Bye 1Goodbye 1Hadoop 2Hello 2World 2。