Hadoop MapReduce wordcount 实例简单介绍

本文详细介绍了在已搭建的Hadoop HA高可用环境中实现wordcount实例的步骤,包括总体架构、Hadoop环境配置以及具体Java代码的编写和执行。读者需先完成hadoop ha环境和开发环境的配置。

本篇介绍MapReduce wordcount简单实例,在此之前请搭建好hadoop ha高可用环境和myeclipse上hadoop api环境配置,如果没有请参考hadoop ha 高可用搭建hadoop hdfs的api简单使用

目录

一、总体架构

二、配置hadoop环境

三、wordcount实例编写


一、总体架构

总体结构如下表所示,即hadoop ha 之上添加了RS(Resource Manager)和NM(Node Manager)。

 

 

二、配置hadoop环境

虽然node01不需要添加RS或NM,但在此采取的策略是在node01上配置好传输到另外三个节点。

重命名mapred-site.xml.template为mapred-site.xml

cp /myapp/hadoop-3.1.2/etc/hadoop/mapred-site.xml.template /myapp/hadoop-3.1.2/etc/hadoop/mapred-site.xml

配置mapred-site.xml,全部内容如下

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
      <name>yarn.app.mapreduce.am.env</name>
      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
      <name>mapreduce.map.env</name>
      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
      <name>mapreduce.reduce.env</name>
      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>

配置yarn-site.xml,configure标签中全部内容如下

 <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
<property>
   <name>yarn.resourcemanager.ha.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.cluster-id</name>
   <value>cluster1</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.rm-ids</name>
   <value>rm1,rm2</value>
 </property>
 <property>
   <name>yarn.resourcemanager.hostname.rm1</name>
   <value>node03</value>
 </property>
 <property>
   <name>yarn.resourcemanager.hostname.rm2</name>
   <value>node04</value>
 </property>
 <property>
   <name>yarn.resourcemanager.zk-address</name>
   <value>node02:2181,node03:2181,node04:2181</value>
 </property>

将两个配置文件分发

scp mapred-site.xml yarn-site.xml node02:`pwd`
scp mapred-site.xml yarn-site.xml node03:`pwd`
scp mapred-site.xml yarn-site.xml node04:`pwd`

在node02、node03、node04上‘zkServer.sh start’启动zookeeper;在node01上‘start-dfs.sh’启动集群

启动yarn(node01中)

start-yarn.sh

启动resourcemanager(node03、node04中)

yarn-daemon.sh start resourcemanager

测试

在windows浏览器中输入“node03:8088”可以看到节点状态。

关闭

1、关闭resourcemanager(node03、node04中)

yarn-daemon.sh stop resourcemanager

2、关闭yarn(node01中)

stop-yarn.sh

3、关闭集群(node01中)

stop-dfs.sh

4、关闭zookeeper(node02、node03、node04)

zkServer.sh stop

三、wordcount实例编写

创建如下三个文件

其中MyWC.java如下

package com.dxw.hadoop.wordcount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MyWC {
	
	public static void main(String[] args) throws Exception {
		
		Configuration conf = new Configuration(true);
		
		Job job = Job.getInstance(conf);
		
		// Create a new Job
	     //Job job = Job.getInstance();
	     job.setJarByClass(MyWC.class);
	     
	     // Specify various job-specific parameters     
	     job.setJobName("myjob");
	     
//	     job.setInputPath(new Path("in"));
//	     job.setOutputPath(new Path("out"));
	     
	     Path input = new Path("/user/root/test.txt");
	     FileInputFormat.addInputPath(job, input );
	     
	     Path output = new Path("/data/wc/output");
	     if(output.getFileSystem(conf).exists(output)){
	    	 output.getFileSystem(conf).delete(output,true);
	     }
	     FileOutputFormat.setOutputPath(job, output );
	     
	     job.setMapperClass(MyMapper.class);
	     job.setMapOutputKeyClass(Text.class);
	     job.setMapOutputValueClass(IntWritable.class);
	     
	     job.setReducerClass(MyReducer.class);

	     // Submit the job, then poll for progress until the job is complete
	     job.waitForCompletion(true);
		
	}

}

MyMapper.java如下

package com.dxw.hadoop.wordcount;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MyMapper extends Mapper<Object, Text, Text, IntWritable>{
	
	private final static IntWritable one = new IntWritable(1);
	   private Text word = new Text();
	   
	   public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
	     StringTokenizer itr = new StringTokenizer(value.toString());
	     while (itr.hasMoreTokens()) {
	       word.set(itr.nextToken());
	       context.write(word, one);
	     }
	   }

}

MyReducer.java如下

package com.dxw.hadoop.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyReducer extends Reducer<Text,IntWritable,Text,IntWritable>{

	 private IntWritable result = new IntWritable();
	 
	   public void reduce(Text key, Iterable<IntWritable> values,
	                      Context context) throws IOException, InterruptedException {
	     int sum = 0;
	     for (IntWritable val : values) {
	       sum += val.get();
	     }
	     result.set(sum);
	     context.write(key, result);
	   }
	
}

将编写好的java代码导出为jar文件

上传到node01上

执行如下命令统计字数

hadoop jar MyWC.jar com.dxw.hadoop.wordcount.MyWC

执行下面命令可以看到统计后的文件

hdfs dfs -ls /data/wc/output

执行下面命令,从hdfs中下载到本地

hdfs dfs -get /data/wc/output/* ./

查看统计结果

vi part-r-00000

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值