当前位置:文档之家› 在windows下安装hadoop

在windows下安装hadoop

在windows下通过安装cygwin模拟linux环境,然后再安装hadoop,是一种简单方便的方式。

首先是搭建cygwin环境:一、安装cygwin二、安装sshd服务三、启动sshd服务四、配置ssh无密码登录一、此处无话可说,按照网上的文档,选择必要的组件即可。

二、1 以管理员身份运行cygwin2 输入:ssh-host-config有关键的两步:*** Query: Do you want to use a different name? (yes/no)选择yes*** Query: Create new privileged user account 'cyg_server'? (yes/no) 选择yes创建的用户名最好为新的用户名,密码最好与windows用户的密码一致三、在开始菜单运ervices.msc或其他手段进入服务列表,找到cygwin sshd服务,查看属性是否为上一步的用户名登陆的,否则更改之。

此时可能不能启动,需要在计算机右击“管理”,用户和组的管理中将上面的用户加入管理员的组,即授予管理员权限。

四、此时使用ssh服务,即输入:ssh localhost会要求输入密码,若显示为“last login:......”则说明已经正确安装ssh服务并且可以启动。

接下来是配置ssh的无密码登陆:1输入:ssh-keygen (一直回车,一般是三次回车)2 输入:cd ~/.ssh3 输入:cp id_rsa.pub authorized_keys此时输入exit退出cygwin,再次以管理员身份进入cygwin,输入ssh localhost,若没有要求输入密码便显示“last login.....”则说明成功了。

接下来是安装Hadoop:一、安装JDK二、下载hadoop及配置hadoop环境三、格式化namenode四、启动hadoop一、安装jdk只有一点要注意,路径中一定不要包含空格,最好就是字母和数字,否则后面可能会报错。

二、下载的后缀为tar.gz按照下图组织文件目录:HadoopCygwin目录位于d盘的programme目录下,将下载的hadoop置于deploy下,再解压即可得到以上目录。

下面是配置hadoop环境:1、配置hadoop-env.sh2、配置core-site.xml3、配置hdfs-site.xml4、配置mapred-site.xml上述文件都在hadoop-0.20.2下的conf目录下1.配置JDK的安装路径# The java implementation to use. Required.export JAVA_HOME=/cygdrive/d/programme/jdk/jdk1.6注意:一定要有cygdrive,然后是你的jdk在windows下的路径,不能有空格的。

2.为方便修改,拷贝hadoop-0.20.2\src\core\core-default.xml到conf目录下,并命名为core-site.xml(1)修改临时文件存放路径<property><name>hadoop.tmp.dir</name><value>/cygdrive/d/programme/HadoopCygwin/sysdata/0.20.2/tmp</value><description>A base for other temporary directories.</description></property>(2)修改文件系统的默认名称<property><name></name><value>hdfs://127.0.0.1:8888</value>注意,此处的8888是你电脑上未使用的端口号即可,下同。

<description>The name of the default file system. A URI whosescheme and authority determine the FileSystem implementation. Theuri's scheme determines the config property (fs.SCHEME.impl) namingthe FileSystem implementation class. The uri's authority is used todetermine the host, port, etc. for a filesystem.</description></property>3为方便修改,拷贝hadoop-0.20.2\src\hdfs\hdfs-default.xml到conf目录,并命名为hdfs-site.xml。

(1)修改DFS文件系统namenode存放name表的目录<property><name>.dir</name><value>/cygdrive/d/programme/HadoopCygwin/sysdata/0.20.2/name</value><description>Determines where on the local filesystem the DFS name nodeshould store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of thedirectories, for redundancy. </description></property>(2)修改DFS文件系统datanode存放数据的目录<property><name>dfs.data.dir</name><value>/cygdrive/d/programme/HadoopCygwin/sysdata/0.20.2/data</value><description>Determines where on the local filesystem an DFS data nodeshould store its blocks. If this is a comma-delimitedlist of directories, then data will be stored in all nameddirectories, typically on different devices.Directories that do not exist are ignored.</description></property>(3)修改数据存放副本数量为1(因为我们要部署的是伪分布式单节点)<property><name>dfs.replication</name><value>1</value><description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.</description></property>4.为方便修改,拷贝hadoop-0.20.2\src\mapred\mapred-default.xml文件到conf目录下,并重命名为mapred-site.xml。

(1)修改jobtracker运行的服务器和端口<property><name>mapred.job.tracker</name><value>hdfs://127.0.0.1:9999</value>注意:这里可以为:127.0.0.1:9999,若不行则加上hdfs<description>The host and port that the MapReduce job tracker runsat. If "local", then jobs are run in-process as a single mapand reduce task.</description></property>(2)修改mepreduce运行存放的即时数据文件目录<property><name>mapred.local.dir</name><value>/cygdrive/d/programme/HadoopCygwin/sysdata/0.20.2/temp</value><description>The local directory where MapReduce stores intermediatedata files. May be a comma-separated list ofdirectories on different devices in order to spread disk i/o.Directories that do not exist are ignored.</description></property>(3)修改Mapreduce存放临时文件的目录<property><name>mapred.child.tmp</name><value>/cygdrive/d/programme/HadoopCygwin/sysdata/0.20.2/temp</value><description> To set the value of tmp directory for map and reduce tasks.If the value is an absolute path, it is directly assigned. Otherwise, it is prepended with task's working directory. The java tasks are executed withoption -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes andstreaming are set with environment variable,TMPDIR='the absolute path of the tmp dir'</description></property>三、格式化namenode在hadoop的bin目录下输入:./hadoop namenode -format四、启动hadoop在hadoop的bin目录下输入:./start-all.sh若看到并且输入:./hadoop fs -ls /若看到则表明安装成功。

相关主题