一個MapReduce 程式示例 細節決定成敗(二) :觀察日誌及 Counter
編寫一個mapreduce 程式:http://blog.itpub.net/30066956/viewspace-2107549/
下面是一個計算輸入檔案中a~z每個單字元的數量的一個map reduce 程式。
執行mr 任務
從日誌中我們可以得到:我們這個job的id是job_1462517728035_0048。 有3個split讀檔案,有3個mapper 任務,1個reducer任務。
map的輸出記錄數是137,reduce的輸入記錄數也是137。也就是說這137條記錄是通過網路進行傳輸,送到reducer任務中的。
在下一篇中,我們使用一個combiner,來優化這個mapreduce 任務。
一個MapReduce 程式示例 細節決定成敗(三) :Combiner
下面是一個計算輸入檔案中a~z每個單字元的數量的一個map reduce 程式。
點選(此處)摺疊或開啟
-
package wordcount;
-
-
import java.io.IOException;
-
-
import org.apache.commons.lang.StringUtils;
-
import org.apache.hadoop.conf.Configuration;
-
import org.apache.hadoop.conf.Configured;
-
import org.apache.hadoop.fs.Path;
-
import org.apache.hadoop.io.IntWritable;
-
import org.apache.hadoop.io.LongWritable;
-
import org.apache.hadoop.io.Text;
-
import org.apache.hadoop.mapreduce.Job;
-
import org.apache.hadoop.mapreduce.Mapper;
-
import org.apache.hadoop.mapreduce.Reducer;
-
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
-
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
-
import org.apache.hadoop.util.Tool;
-
import org.apache.hadoop.util.ToolRunner;
-
import org.apache.log4j.Logger;
-
-
public class MyWordCountJob extends Configured implements Tool {
-
Logger log = Logger.getLogger(MyWordCountJob.class);
-
-
public static class MyWordCountMapper extends
-
Mapper<LongWritable, Text, Text, IntWritable> {
-
Logger log = Logger.getLogger(MyWordCountJob.class);
-
-
Text mapKey = new Text();
-
IntWritable mapValue = new IntWritable(1);
-
@Override
-
protected void map(LongWritable key, Text value, Context context)
-
throws IOException, InterruptedException {
-
for(char c :value.toString().toLowerCase().toCharArray()){
-
if(c>='a' && c <='z'){
-
mapKey.set(String.valueOf(c));
-
context.write(mapKey, mapValue);
-
}
-
}
-
}
-
-
}
-
-
-
public static class MyWordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
-
Text rkey = new Text();
-
IntWritable rvalue = new IntWritable(1);
-
@Override
-
protected void reduce(Text key, Iterable<IntWritable> values,Context context)
-
throws IOException, InterruptedException {
-
int n=0;
-
for(IntWritable value :values){
-
n+= value.get();
-
}
-
rvalue.set(n);
-
context.write(key, rvalue);
-
}
-
}
-
-
@Override
-
public int run(String[] args) throws Exception {
-
//valid the parameters
-
if(args.length !=2){
-
return -1;
-
}
-
-
Job job = Job.getInstance(getConf(), "MyWordCountJob");
-
job.setJarByClass(MyWordCountJob.class);
-
-
Path inPath = new Path(args[0]);
-
Path outPath = new Path(args[1]);
-
-
outPath.getFileSystem(getConf()).delete(outPath,true);
-
TextInputFormat.setInputPaths(job, inPath);
-
TextOutputFormat.setOutputPath(job, outPath);
-
-
-
job.setMapperClass(MyWordCountJob.MyWordCountMapper.class);
-
job.setReducerClass(MyWordCountJob.MyWordCountReducer.class);
-
job.setInputFormatClass(TextInputFormat.class);
-
job.setOutputFormatClass(TextOutputFormat.class);
-
-
job.setMapOutputKeyClass(Text.class);
-
job.setMapOutputValueClass(IntWritable.class);
-
job.setOutputKeyClass(Text.class);
-
job.setOutputValueClass(IntWritable.class);
-
-
return job.waitForCompletion(true)?0:1;
-
}
-
public static void main(String [] args){
-
int result = 0;
-
try {
-
result = ToolRunner.run(new Configuration(), new MyWordCountJob(), args);
-
} catch (Exception e) {
-
e.printStackTrace();
-
}
-
System.exit(result);
-
}
-
- }
輸入檔案:
點選(此處)摺疊或開啟
-
[train@sandbox MyWordCount]$ hdfs dfs -ls mrdemo
-
Found 3 items
-
-rw-r--r-- 3 train hdfs 34 2016-05-11 01:41 mrdemo/demoinput1.txt
-
-rw-r--r-- 3 train hdfs 42 2016-05-11 01:41 mrdemo/demoinput2.txt
- -rw-r--r-- 3 train hdfs 81 2016-05-11 01:41 mrdemo/demoinput3.txt
點選(此處)摺疊或開啟
-
[train@sandbox MyWordCount]$ hdfs dfs -cat mrdemo/*input*.txt
-
hello world
-
how are you
-
i am hero
-
what is your name
-
where are you come from
-
abcdefghijklmnopqrsturwxyz
-
abcdefghijklmnopqrsturwxyz
- abcdefghijklmnopqrsturwxyz
執行mr 任務
先看一下結果檔案,可以看到按我們預期計算出了對應字元的個數(ps:這不是重點)
點選(此處)摺疊或開啟
-
a 8
-
b 3
-
c 4
-
d 4
-
e 11
-
f 4
-
g 3
-
h 8
-
i 5
-
j 3
-
k 3
-
l 6
-
m 7
-
n 4
-
o 12
-
p 3
-
q 3
-
r 13
-
s 4
-
t 4
-
u 6
-
w 7
-
x 3
-
y 6
-
z 3
點選(此處)摺疊或開啟
-
a 8
-
b 3
-
c 4
-
d 4
-
e 11
-
f 4
-
g 3
-
h 8
-
i 5
-
j 3
-
k 3
-
l 6
-
m 7
-
n 4
-
o 12
-
p 3
-
q 3
-
r 13
-
s 4
-
t 4
-
u 6
-
w 7
-
x 3
-
y 6
- z 3
下面看一個執行日誌關注標紅的部分(這才是重點)
點選(此處)摺疊或開啟
-
[train@sandbox MyWordCount]$ hadoop jar mywordcount.jar mrdemo/ mrdemo/output
-
16/05/11 04:00:45 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.252.131:8050
-
16/05/11 04:00:46 INFO input.FileInputFormat: Total input paths to process : 3
-
16/05/11 04:00:46 INFO mapreduce.JobSubmitter: number of splits:3
-
16/05/11 04:00:46 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
-
16/05/11 04:00:46 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
-
16/05/11 04:00:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1462517728035_0048
-
16/05/11 04:00:47 INFO impl.YarnClientImpl: Submitted application application_1462517728035_0048 to ResourceManager at sandbox.hortonworks.com/192.168.252.131:8050
-
16/05/11 04:00:47 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1462517728035_0048/
-
16/05/11 04:00:47 INFO mapreduce.Job: Running job: job_1462517728035_0048
-
16/05/11 04:00:55 INFO mapreduce.Job: Job job_1462517728035_0048 running in uber mode : false
-
16/05/11 04:00:55 INFO mapreduce.Job: map 0% reduce 0%
-
16/05/11 04:01:10 INFO mapreduce.Job: map 33% reduce 0%
-
16/05/11 04:01:11 INFO mapreduce.Job: map 100% reduce 0%
-
16/05/11 04:01:19 INFO mapreduce.Job: map 100% reduce 100%
-
16/05/11 04:01:19 INFO mapreduce.Job: Job job_1462517728035_0048 completed successfully
-
16/05/11 04:01:19 INFO mapreduce.Job: Counters: 43
-
File System Counters
-
FILE: Number of bytes read=1102
-
FILE: Number of bytes written=339257
-
FILE: Number of read operations=0
-
FILE: Number of large read operations=0
-
FILE: Number of write operations=0
-
HDFS: Number of bytes read=556
-
HDFS: Number of bytes written=103
-
HDFS: Number of read operations=12
-
HDFS: Number of large read operations=0
-
HDFS: Number of write operations=2
-
Job Counters
-
Launched map tasks=3
-
Launched reduce tasks=1
-
Data-local map tasks=3
-
Total time spent by all maps in occupied slots (ms)=314904
-
Total time spent by all reduces in occupied slots (ms)=34648
-
Map-Reduce Framework
-
Map input records=8
-
Map output records=137
-
Map output bytes=822
-
Map output materialized bytes=1114
-
Input split bytes=399
-
Combine input records=0
-
Combine output records=0
-
Reduce input groups=25
-
Reduce shuffle bytes=1114
-
Reduce input records=137
-
Reduce output records=25
-
Spilled Records=274
-
Shuffled Maps =3
-
Failed Shuffles=0
-
Merged Map outputs=3
-
GC time elapsed (ms)=241
-
CPU time spent (ms)=3340
-
Physical memory (bytes) snapshot=1106452480
-
Virtual memory (bytes) snapshot=3980922880
-
Total committed heap usage (bytes)=884604928
-
Shuffle Errors
-
BAD_ID=0
-
CONNECTION=0
-
IO_ERROR=0
-
WRONG_LENGTH=0
-
WRONG_MAP=0
-
WRONG_REDUCE=0
-
File Input Format Counters
-
Bytes Read=157
-
File Output Format Counters
- Bytes Written=103
map的輸出記錄數是137,reduce的輸入記錄數也是137。也就是說這137條記錄是通過網路進行傳輸,送到reducer任務中的。
在下一篇中,我們使用一個combiner,來優化這個mapreduce 任務。
一個MapReduce 程式示例 細節決定成敗(三) :Combiner
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30066956/viewspace-2107875/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 第五章 Vlookup函式示例-細節決定成敗函式
- 《細節決定成敗》 汪中求著
- Java集合詳解8:Java集合類細節精講,細節決定成敗Java
- 邦芒簡歷:求職簡歷細節決定成敗求職
- 細節決定成敗,淺析《合金彈頭》的成功之道
- 熟悉海外客戶區域特點,細節決定成敗!
- 開發者談F2P模式:細節決定成敗模式
- 細節決定成敗,不容忽視的10道Node面試題面試題
- 【2024-03-06】細節成敗
- AutoCAD 2024:細節決定成敗,精準設計從這裡開始 mac/win啟用版Mac
- 第一章:第一節資料載入及初步觀察-課程
- MapReduce 程式設計模型 & WordCount 示例程式設計模型
- 觀察力,細心需要強化
- 穩定幣——決定公鏈成敗的關鍵
- Cirium:2021年勞動節假期市場及航空運營觀察
- Rxjava2(二)、五種觀察者模式建立及背壓RxJava模式
- 一個 JSer 的 Dart 學習日誌(二):變數、常量JSDart變數
- 菜鳥成長系列-觀察者模式模式
- Promise 規範解讀及實現細節 (二)Promise
- mysql基礎問題三問(底層邏輯;正在執行;日誌觀察)MySql
- 如何實現一個高效的本地日誌收集程式
- SpringBoot入門(二):日誌及自定義屬性Spring Boot
- 二次元手遊觀察:全球格局及投放資料分析二次元
- MapReduce 示例:減少 Hadoop MapReduce 中的側連線Hadoop
- Laravel 的一個命名細節分享Laravel
- 一個小的技術細節
- h5效能優化,細節決定結果。H5優化
- 一個完整的go 日誌元件Go元件
- [Java/日誌] 日誌框架列印應用程式日誌程式碼的執行情況Java框架
- 一起分析Nginx 日誌及效能排查Nginx
- PoweJob高階特性-MapReduce完整示例
- php基礎常用程式碼示例,利用 debug_backtrace 自定義個基本的日誌列印函式PHP函式
- Nginx日誌格式設定Nginx
- FLOWERS開發日誌(二)
- 寶鯤財經:一個好習慣決定你的交易成敗
- 細說 Java 主流日誌工具庫Java
- 玄機-第二章日誌分析-apache日誌分析Apache
- 設計模式 | 觀察者模式及典型應用設計模式
- 一個 JSer 的 Dart 學習日誌(四):非同步程式設計JSDart非同步程式設計