一個MapReduce 程式示例 細節決定成敗(二) :觀察日誌及 Counter
編寫一個mapreduce 程式:http://blog.itpub.net/30066956/viewspace-2107549/
下面是一個計算輸入檔案中a~z每個單字元的數量的一個map reduce 程式。
執行mr 任務
從日誌中我們可以得到:我們這個job的id是job_1462517728035_0048。 有3個split讀檔案,有3個mapper 任務,1個reducer任務。
在下一篇中,我們使用一個combiner,來優化這個mapreduce 任務。
一個MapReduce 程式示例 細節決定成敗(三) :Combiner
下面是一個計算輸入檔案中a~z每個單字元的數量的一個map reduce 程式。
package wordcount;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
public class MyWordCountJob extends Configured implements Tool {
Logger log = Logger.getLogger(MyWordCountJob.class);
public static class MyWordCountMapper extends
Mapper<LongWritable, Text, Text, IntWritable> {
Logger log = Logger.getLogger(MyWordCountJob.class);
Text mapKey = new Text();
IntWritable mapValue = new IntWritable(1);
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
for(char c :value.toString().toLowerCase().toCharArray()){
if(c>='a' && c <='z'){
context.write(mapKey, mapValue);
public static class MyWordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
Text rkey = new Text();
IntWritable rvalue = new IntWritable(1);
protected void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException {
int n=0;
for(IntWritable value :values){
n+= value.get();
context.write(key, rvalue);
public int run(String[] args) throws Exception {
//valid the parameters
if(args.length !=2){
return -1;
Job job = Job.getInstance(getConf(), "MyWordCountJob");
Path inPath = new Path(args[0]);
Path outPath = new Path(args[1]);
TextInputFormat.setInputPaths(job, inPath);
TextOutputFormat.setOutputPath(job, outPath);
return job.waitForCompletion(true)?0:1;
public static void main(String [] args){
int result = 0;
try {
result = ToolRunner.run(new Configuration(), new MyWordCountJob(), args);
} catch (Exception e) {
- }
[train@sandbox MyWordCount]$ hdfs dfs -ls mrdemo
Found 3 items
-rw-r--r-- 3 train hdfs 34 2016-05-11 01:41 mrdemo/demoinput1.txt
-rw-r--r-- 3 train hdfs 42 2016-05-11 01:41 mrdemo/demoinput2.txt
-rw-r--r-- 3 train hdfs 81 2016-05-11 01:41 mrdemo/demoinput3.txt
[train@sandbox MyWordCount]$ hdfs dfs -cat mrdemo/*input*.txt
hello world
how are you
i am hero
what is your name
where are you come from
abcdefghijklmnopqrsturwxyz
執行mr 任務
a 8
b 3
c 4
d 4
e 11
f 4
g 3
h 8
i 5
j 3
k 3
l 6
m 7
n 4
o 12
p 3
q 3
r 13
s 4
t 4
u 6
w 7
x 3
y 6
z 3
z 3
[train@sandbox MyWordCount]$ hadoop jar mywordcount.jar mrdemo/ mrdemo/output
16/05/11 04:00:45 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/
16/05/11 04:00:46 INFO input.FileInputFormat: Total input paths to process : 3
16/05/11 04:00:46 INFO mapreduce.JobSubmitter: number of splits:3
16/05/11 04:00:46 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
16/05/11 04:00:46 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
16/05/11 04:00:46 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
16/05/11 04:00:46 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
16/05/11 04:00:46 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
16/05/11 04:00:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1462517728035_0048
16/05/11 04:00:47 INFO impl.YarnClientImpl: Submitted application application_1462517728035_0048 to ResourceManager at sandbox.hortonworks.com/
16/05/11 04:00:47 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1462517728035_0048/
16/05/11 04:00:47 INFO mapreduce.Job: Running job: job_1462517728035_0048
16/05/11 04:00:55 INFO mapreduce.Job: Job job_1462517728035_0048 running in uber mode : false
16/05/11 04:00:55 INFO mapreduce.Job: map 0% reduce 0%
16/05/11 04:01:10 INFO mapreduce.Job: map 33% reduce 0%
16/05/11 04:01:11 INFO mapreduce.Job: map 100% reduce 0%
16/05/11 04:01:19 INFO mapreduce.Job: map 100% reduce 100%
16/05/11 04:01:19 INFO mapreduce.Job: Job job_1462517728035_0048 completed successfully
16/05/11 04:01:19 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=1102
FILE: Number of bytes written=339257
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=556
HDFS: Number of bytes written=103
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=314904
Total time spent by all reduces in occupied slots (ms)=34648
Map-Reduce Framework
Map input records=8
Map output records=137
Map output bytes=822
Map output materialized bytes=1114
Input split bytes=399
Combine input records=0
Combine output records=0
Reduce input groups=25
Reduce shuffle bytes=1114
Reduce input records=137
Reduce output records=25
Spilled Records=274
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=241
CPU time spent (ms)=3340
Physical memory (bytes) snapshot=1106452480
Virtual memory (bytes) snapshot=3980922880
Total committed heap usage (bytes)=884604928
Shuffle Errors
File Input Format Counters
Bytes Read=157
File Output Format Counters
Bytes Written=103
在下一篇中,我們使用一個combiner,來優化這個mapreduce 任務。
一個MapReduce 程式示例 細節決定成敗(三) :Combiner
