Hadoop MapReduce之wordcount(詞頻統計)
1.建立test.log
2.hadoop建立目錄及上傳
3.檢視官方封裝好的函式,我們選取wordcount
4.執行wordcount
# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1
# 官方模板jar包 函式 輸入目錄 輸出目錄(未建立)
5.驗證wordcount,詞頻統計
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 mapreduce]# more /tmp/test.log
-
1
-
2
-
3
-
a
-
b
-
a
-
v
-
a a a
-
abc
-
我是誰
-
%……
- %
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 ~]# hadoop fs -mkdir /testdir
-
16/02/28 19:40:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-
[root@sht-sgmhadoopnn-01 ~]# hadoop fs -put /tmp/test.log /testdir/
- 16/02/28 19:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 ~]#cd /hadoop/hadoop-2.7.2/share/hadoop/mapreduce
-
[root@sht-sgmhadoopnn-01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar
-
An example program must be given as the first argument.
-
Valid program names are:
-
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
-
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
-
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
-
dbcount: An example job that count the pageview counts from a database.
-
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
-
grep: A map/reduce program that counts the matches of a regex in the input.
-
join: A job that effects a join over sorted, equally partitioned datasets
-
multifilewc: A job that counts words from several files.
-
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
-
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
-
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
-
randomwriter: A map/reduce program that writes 10GB of random data per node.
-
secondarysort: An example defining a secondary sort to the reduce.
-
sort: A map/reduce program that sorts the data written by the random writer.
-
sudoku: A sudoku solver.
-
teragen: Generate data for the terasort
-
terasort: Run the terasort
-
teravalidate: Checking results of terasort
-
wordcount: A map/reduce program that counts the words in the input files.
-
wordmean: A map/reduce program that counts the average length of the words in the input files.
-
wordmedian: A map/reduce program that counts the median length of the words in the input files.
- wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1
# 官方模板jar包 函式 輸入目錄 輸出目錄(未建立)
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1
-
16/02/28 19:40:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-
16/02/28 19:40:53 INFO input.FileInputFormat: Total input paths to process : 1
-
16/02/28 19:40:53 INFO mapreduce.JobSubmitter: number of splits:1
-
16/02/28 19:40:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456590271264_0002
-
16/02/28 19:40:54 INFO impl.YarnClientImpl: Submitted application application_1456590271264_0002
-
16/02/28 19:40:54 INFO mapreduce.Job: The url to track the job: http://sht-sgmhadoopnn-01:8088/proxy/application_1456590271264_0002/
-
16/02/28 19:40:54 INFO mapreduce.Job: Running job: job_1456590271264_0002
-
16/02/28 19:41:04 INFO mapreduce.Job: Job job_1456590271264_0002 running in uber mode : false
-
16/02/28 19:41:04 INFO mapreduce.Job: map 0% reduce 0%
-
16/02/28 19:41:12 INFO mapreduce.Job: map 100% reduce 0%
-
16/02/28 19:41:21 INFO mapreduce.Job: map 100% reduce 100%
-
16/02/28 19:41:22 INFO mapreduce.Job: Job job_1456590271264_0002 completed successfully
-
16/02/28 19:41:22 INFO mapreduce.Job: Counters: 49
-
File System Counters
-
FILE: Number of bytes read=102
-
FILE: Number of bytes written=244621
-
FILE: Number of read operations=0
-
FILE: Number of large read operations=0
-
FILE: Number of write operations=0
-
HDFS: Number of bytes read=142
-
HDFS: Number of bytes written=56
-
HDFS: Number of read operations=6
-
HDFS: Number of large read operations=0
-
HDFS: Number of write operations=2
-
Job Counters
-
Launched map tasks=1
-
Launched reduce tasks=1
-
Data-local map tasks=1
-
Total time spent by all maps in occupied slots (ms)=5537
-
Total time spent by all reduces in occupied slots (ms)=6555
-
Total time spent by all map tasks (ms)=5537
-
Total time spent by all reduce tasks (ms)=6555
-
Total vcore-milliseconds taken by all map tasks=5537
-
Total vcore-milliseconds taken by all reduce tasks=6555
-
Total megabyte-milliseconds taken by all map tasks=5669888
-
Total megabyte-milliseconds taken by all reduce tasks=6712320
-
Map-Reduce Framework
-
Map input records=12
-
Map output records=14
-
Map output bytes=100
-
Map output materialized bytes=102
-
Input split bytes=98
-
Combine input records=14
-
Combine output records=10
-
Reduce input groups=10
-
Reduce shuffle bytes=102
-
Reduce input records=10
-
Reduce output records=10
-
Spilled Records=20
-
Shuffled Maps =1
-
Failed Shuffles=0
-
Merged Map outputs=1
-
GC time elapsed (ms)=79
-
CPU time spent (ms)=2560
-
Physical memory (bytes) snapshot=445992960
-
Virtual memory (bytes) snapshot=1775263744
-
Total committed heap usage (bytes)=306184192
-
Shuffle Errors
-
BAD_ID=0
-
CONNECTION=0
-
IO_ERROR=0
-
WRONG_LENGTH=0
-
WRONG_MAP=0
-
WRONG_REDUCE=0
-
File Input Format Counters
-
Bytes Read=44
-
File Output Format Counters
-
Bytes Written=56
-
You have mail in /var/spool/mail/root
- [root@sht-sgmhadoopnn-01 mapreduce]#
點選(此處)摺疊或開啟
-
[root@sht-sgmhadoopnn-01 mapreduce]# hadoop fs -ls /out1
-
16/02/28 19:43:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-
Found 2 items
-
-rw-r--r-- 3 root supergroup 0 2016-02-28 19:41 /out1/_SUCCESS
-
-rw-r--r-- 3 root supergroup 56 2016-02-28 19:41 /out1/part-r-00000
-
[root@sht-sgmhadoopnn-01 mapreduce]# hadoop fs -text /out1/part-r-00000
-
16/02/28 19:43:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-
% 1
-
%…… 1
-
1 1
-
2 1
-
3 1
-
a 5
-
abc 1
-
b 1
-
v 1
-
我是誰 1
- You have mail in /var/spool/mail/root
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2015610/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 詞頻統計mapreduce
- Java、Scala、Python ☞ 本地WordCount詞頻統計對比JavaPython
- MapReduce 程式設計模型 & WordCount 示例程式設計模型
- 詞語詞頻統計
- 詞頻統計
- 文字挖掘之語料庫、分詞、詞頻統計分詞
- Hadoop之MapReduce2架構設計Hadoop架構
- 使用MapReduce執行WordCount案例
- Hadoop面試題之MapReduceHadoop面試題
- python如何統計詞頻Python
- MapReduce環境搭建以及WordCount案例
- Hadoop 三劍客之 —— 分散式計算框架 MapReduceHadoop分散式框架
- python實現詞頻統計Python
- PostgreSQL全文檢索-詞頻統計SQL
- Hadoop大資料實戰系列文章之Mapreduce 計算框架Hadoop大資料框架
- Hadoop學習——MapReduceHadoop
- hadoop_MapReduce yarnHadoopYarn
- Hadoop(十四)MapReduce概述Hadoop
- 使用IDEA+Maven實現MapReduce的WordCount功能IdeaMaven
- hadoop學習筆記:執行wordcount對檔案字串進行統計案例Hadoop筆記字串
- 詞頻統計任務程式設計實踐程式設計
- hadoop之mapreduce.input.fileinputformat.split.minsize引數HadoopORM
- Hadoop 學習系列(四)之 MapReduce 原理講解Hadoop
- Hadoop之MapReduce2基礎梳理及案例Hadoop
- Python統計四六級考試的詞頻Python
- Hadoop 專欄 - MapReduce 入門Hadoop
- MapReduce 示例:減少 Hadoop MapReduce 中的側連線Hadoop
- python TK庫 統計word文件單詞詞頻程式 UI選擇文件PythonUI
- Spark-stream基礎---sparkStreaming和Kafka整合wordCount單詞計數SparkKafka
- 基於Hadoop框架實現的對歷年四級單詞的詞頻分析(入門級Hadoop專案)Hadoop框架
- 【Cloud Computing】Hadoop環境安裝、基本命令及MapReduce字數統計程式CloudHadoop
- 呼叫MapReduce對檔案中單詞出現次數進行統計
- hadoop之旅6-windows本地MapReducer離線單詞統計HadoopWindows
- hadoop的mapreduce串聯執行Hadoop
- Hadoop(三)通過C#/python實現Hadoop MapReduceHadoopC#Python
- Javafx-【直方圖】文字頻次統計工具 中文/英文單詞統計Java直方圖
- 統計英文名著中單詞出現頻率
- Linux下安裝Hadoop 詳解及WordCount執行LinuxHadoop
- 期末大作業關於利用hadoop來統計單詞數目Hadoop