概述集群配置及节点信息如下:
8c16g 操作系统centos7.3,操作安装用户ucmed(注意:运维的安全基线加固会影响hadoop启动)
192.168.3.184 master
192.168.3.185 slave1
192.168.3.186 slave2
| master | slave1 | slave2 | |
|---|---|---|---|
| HDFS | NamNode,DataNode | DataNode | SecondaryNameNode,DataNode |
| YARN | NodeManager | ResourceManager,NodeManager | NodeManager |
1.修改主机名称,配置hosts及免密登录
sudo hostnamectl set-hostname master
sudo hostnamectl set-hostname slave1
sudo hostnamectl set-hostname slave2
配置hosts文件
sudo vi /etc/hosts
192.168.3.184 master
192.168.3.185 slave1
192.168.3.186 slave2
配置免密
ssh-keygen -t rsa -b 2048 -v
拷贝 ~/.ssh/id_rsa.pub 到目标服务器并重命名
cp ~/.ssh/id_rsa.pub /tmp/master.pub
cp ~/.ssh/id_rsa.pub /tmp/slave1.pub
cp ~/.ssh/id_rsa.pub /tmp/slave2.pub
cat /tmp/master.pub >> ~/.ssh/authorized_keys
cat /tmp/slave1.pub >> ~/.ssh/authorized_keys
cat /tmp/slave2.pub >> ~/.ssh/authorized_keys
chmod 700 ~/.ssh/
chmod 600 ~/.ssh/authorized_keys
2.关闭防火墙及SELinux
查看防火墙状态:systemctl status firewalld
关防火墙
sudo systemctl disable firewalld
sudo systemctl stop firewalld
查看SELinux状态:sestatus
关 SELinux
#永久关闭,需要重启机器:
vi /etc/syscon/root/.bashrcfig/selinux
#下面为selinux文件中需要修改的元素
SELINUX=disabled
3.安装oracleJDK
复制 jdk-8u311-linux-x64.tar.gz到服务器
解压
sudo mkdir /usr/local/src/jdk
sudo cp /tmp/jdk-8u311-linux-x64.tar.gz /usr/local/src/jdk
cd /usr/local/src/jdk
sudo tar -zxvf jdk-8u311-linux-x64.tar.gz
添加环境变量:
sudo vim /etc/profile
export JAVA_HOME=/usr/local/src/jdk/jdk1.8.0_311
export PATH=$PATH:$JAVA_HOME/bin
立即生效文件
source /etc/profile
java -version
4. hadoop 3.3.1安装
https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
sudo mkdir /usr/local/src/hadoop
sudo cp /tmp/hadoop-3.3.1.tar.gz /usr/local/src/hadoop/
sudo chown -R ucmed:ucmed /usr/local/src/hadoop
cd /usr/local/src/hadoop
sudo tar -zxvf hadoop-3.3.1.tar.gz
cd /usr/local/src/hadoop/hadoop-3.3.1
sudo mkdir -p /opt/hdfs/data
sudo mkdir -p /opt/hdfs/name
sudo chown -R ucmed:ucmed /opt/hdfs
配置环境变量
sudo vim /etc/profile
export HADOOP_HOME=/usr/local/src/hadoop/hadoop-3.3.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
source /etc/profile
hadoop version
5. 配置文件修改
cd $HADOOP_HOME/etc/hadoop
vim core-site.xml
<configuration>
<!-- 指定NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
<!-- 指定hadoop数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hdfs/data</value>
</property>
<!-- 配置HDFS网页登录使用的静态用户为ucmed -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>ucmed</value>
</property>
</configuration>
vim hdfs-site.xml
<configuration>
<!-- nn web端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>master:9870</value>
</property>
<!-- 2nn web端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave2:9868</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 指定MR走shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定ResourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>slave1</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>true</value>
<description>检测物理内存的使用是否超出分配值,若任务超出分配值,则将其杀掉,默认true。</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>true</value>
<description>检测虚拟内存的使用是否超出;若任务超出分配值,则将其杀掉,默认true。在确定内存不会泄漏的情况下可以设置此项为 False;</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>8</value>
<description>任务每使用1MB物理内存,最多可使用虚拟内存量比率,默认2.1;在上一项中设置为false不检测虚拟内存时,此项就无意义了</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
<description>该节点上YARN可使用的总核心数;一般设为cat /proc/cpuinfo| grep "processor"| wc -l 的值。默认是8个;</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>24576</value>
<description>该节点上YARN可使用的物理内存总量,【向操作系统申请的总量】默认是8192(MB)</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<description>单个容器/调度器可申请的最少物理内存量,默认是1024(MB);一般每个contain都分配这个值;即:capacity memory:3072, vCores:1,如果提示物理内存溢出,提高这个���即可;</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
<description>单个容器申请最大值</description>
</property>
<property>
<description>cpu申请最小量</description>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.webapp.api-service.enable</name>
<value>true</value>
</property>
<property>
<name>yarn.webapp.ui2.enable</name>
<value>true</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
%HADOOP_HOME%\etc\hadoop,
%HADOOP_HOME%\share\hadoop\common\*,
%HADOOP_HOME%\share\hadoop\common\lib\*,
%HADOOP_HOME%\share\hadoop\hdfs\*,
%HADOOP_HOME%\share\hadoop\hdfs\lib\*,
%HADOOP_HOME%\share\hadoop\mapreduce\*,
%HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
%HADOOP_HOME%\share\hadoop\yarn\*,
%HADOOP_HOME%\share\hadoop\yarn\lib\*
</value>
</property>
</configuration>
vim mapred-site.xml
<configuration>
<!-- 指定MapReduce程序运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
vim workers
master
slave1
slave2
vim hadoop-env.sh
export JAVA_HOME=/usr/local/src/jdk/jdk1.8.0_311
在****in/start-dfs.sh和****in/stop-dfs.sh文件添加如下内容
HDFS_DATANODE_USER=ucmed
HDFS_DATANODE_SECURE_USER=ucmed
HDFS_NAMENODE_USER=ucmed
HDFS_SECONDARYNAMENODE_USER=ucmed
在****in/start-yarn.sh和****in/stop-yarn.sh文件添加如下内容
YARN_RESOURCEMANAGER_USER=ucmed
HADOOP_SECURE_DN_USER=ucmed
YARN_NODEMANAGER_USER=ucmed
分发配置到slave1和slave2,并配置环境变量(java,hadoop)
scp -r /usr/local/src/hadoop/hadoop-3.3.1 slave1:/usr/local/src/hadoop/
scp -r /usr/local/src/hadoop/hadoop-3.3.1 slave2:/usr/local/src/hadoop/
在master节点上初始化HDFS文件系统
hdfs namenode -format
在master节点上启动hdfs
cd /usr/local/src/hadoop/hadoop-3.3.1/s bin
./start-dfs.sh
在slave1节点上启动yarn(因为yarn的主节点配置的是slave1)
cd /usr/local/src/hadoop/hadoop-3.3.1/./start-yarn.sh
验证服务
Web端查看HDFS的NameNode
Web端查看YARN的ResourceManager
6.测试
cd $HADOOP_HOME/share/hadoop/mapreduce
向HDFS集群写10个10MB的文件:
hadoop jar ./hadoop-mapreduce-client-jobclient-3.3.1-tests.jar \
TestDFSIO \
-write \
-nrFiles 10 \
-size 10MB
删除测试临时文件:
hadoop jar hadoop-mapreduce-client-jobclient-3.3.1-tests.jar \
mrbench \
-numRuns 20 \
-maps 3 \
-reduces 3 \
-inputLines 5 \
-inputType descending
测试重复执行小作业效率
测试使用3个mapper和3个reducer运行一个小作业20次,生成输入行数为5,降序排列:
hadoop jar hadoop-mapreduce-client-jobclient-3.3.1-tests.jar \
mrbench \
-numRuns 20 \
-maps 3 \
-reduces 3 \
-inputLines 5 \
-inputType descending
测试NameNode负载
测试使用3个mapper和3个reducer来创建100个文件:
hadoop jar hadoop-mapreduce-client-jobclient-3.3.1-tests.jar \
nnbench \
-operation create_write \
-maps 3 \
-reduces 3 \
-numberOfFiles 100 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true

2906

被折叠的 条评论
为什么被折叠?



