mapreduce的程式設計模型，計數器

豐澤發表於2018-07-02

原文網址 : https://juejin.im/post/5b39f30be51d4558aa052121

程式設計模型

1.job = map+reduce

2.Map的輸出是reduce的輸入

3.所有的輸入和輸出都是<Key，Values>，一共4對。

4.K2=K3 V3是一個集合，該集合的每個元素就是V2。

5.所有的資料型別都必須是Hadoop自己的資料型別。

int--->IntWritable
long--->LongWritable
string--->Text
null--->NUllWritable
複製程式碼

Submitter執行類：

package com.etc;

import org.apache.commons.io.FileUtils;
import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.File;
import java.io.IOException;

public class JobSubmitter {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Job job = Job.getInstance(new Configuration());

        job.setJarByClass(JobSubmitter.class);


        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReduce.class);
        //map輸出
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        //reduce輸出
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);


        File file = new File("F:\\wordcountfengze");
        if (file.exists()){
            FileUtils.deleteDirectory(file);
        }

        FileInputFormat.setInputPaths(job,new Path("F:\\wordcountwangcc"));
        FileOutputFormat.setOutputPath(job,new Path("F:\\wordcountfengze"));

        job.setNumReduceTasks(1);
        boolean tt = job.waitForCompletion(true);
        System.out.println(tt);


    }
}

複製程式碼

Mapper類1：

package com.etc;

import org.apache.commons.collections.ArrayStack;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;

/**
 * KEYIN ：是map task讀取到的資料的key的型別，是一行的起始偏移量Long
 * VALUEIN:是map task讀取到的資料的value的型別，是一行的內容String
 * KEYOUT：是使用者的自定義map方法要返回的結果kv資料的key的型別，在wordcount邏輯中，我們需要返回的是單詞String
 * VALUEOUT:是使用者的自定義map方法要返回的結果kv資料的value的型別，在wordcount邏輯中，我們需要返回的是整數Integer
 * 但是，在mapreduce中，map產生的資料需要傳輸給reduce，需要進行序列化和反序列化，而jdk中的原生序列化機制產生的資料量比較冗餘，就會導致資料在mapreduce執行過程中傳輸效率低下
 * 所以，hadoop專門設計了自己的序列化機制，那麼，mapreduce中傳輸的資料型別就必須實現hadoop自己的序列化介面
 * hadoop為jdk中的常用基本型別Long String Integer Float等資料型別封住了自己的實現了hadoop序列化介面的型別：LongWritable,Text,IntWritable,FloatWritable
 */
public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        //切單詞

        String line = value.toString();
        String[] words = line.split("\t");//分裂空格
        List<String> listStri = new ArrayList<String>(Arrays.asList(words));//String陣列轉化ArrayList
        //遍歷陣列查詢空格刪除空格
        for (int i = 0; i <listStri.size() ; i++) {
            if (listStri.get(i).equals(" ")){
                listStri.remove(i);
            }
        }
        for (String word : listStri) {
            context.write(new Text( word), new IntWritable(1));
        }
    }
}

複製程式碼

Mapper類2：

package com.etc;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
       String line = value.toString();
        String[] split = line.split(",");
        for (String s : split) {
            //重點values個數只能為1
            context.write(new Text(s),new IntWritable(1));
        }
    }
}

複製程式碼

Reduce類：

package com.etc;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
//                                            k3   v3           k4  v4
public class WordCountReduce  extends Reducer<Text,IntWritable,Text,IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int count = 0;
        for (IntWritable value : values) {
            count = count + value.get();
        }
        context.write(new Text(key),new IntWritable(count));
    }
}

複製程式碼

動態計數器：

Context物件的getCounter方法有兩個String型別的輸入引數，分別代表組名稱和計數器名稱。

Public Counter getCounter（String groupName，String counterName）

MapReduce 程式設計模型 & WordCount 示例
2019-08-01
程式設計模型
詳解MapReduce中的五大程式設計模型
2019-03-18
程式設計模型
【轉載】MapReduce程式設計 Intellij Idea配置MapReduce程式設計環境
2020-04-07
程式設計IntelliJIdea
MapReduce--程式設計模板
2020-12-02
程式設計
Socket程式設計模型
2024-06-03
程式設計模型
Python程式設計：Counter計數器-dict字典的子類
2018-08-24
Python程式設計
伺服器端程式設計之 IO 模型
2019-04-03
伺服器程式設計模型
cuda程式設計與gpu平行計算（四）：cuda程式設計模型
2020-12-31
程式設計GPU模型
非同步程式設計模型的思考
2021-11-01
非同步程式設計模型
Spark 程式設計模型(上)
2018-09-13
Spark程式設計模型
MapReduce1架構設計
2018-05-28
架構
精心設計的 GNN 只是“計數器”？
2021-11-06
GNN
數倉模型設計詳解
2021-01-15
模型
SAP Cloud Application Programming 程式設計模型(CAP)的設計準則
2021-12-19
CloudAPP程式設計模型
10倍程式設計師的思考模型
2021-07-21
程式設計師模型
併發程式設計---JMM模型
2020-09-29
程式設計模型
Nio程式設計模型總結
2019-06-27
程式設計模型
網路程式設計-OSI模型
2024-10-31
程式設計模型
程式設計師的數學
2024-06-15
程式設計師
好程式設計師大資料培訓分享MapReduce理解
2020-11-26
程式設計師大資料
伺服器網路程式設計之執行緒模型
2019-04-04
伺服器程式設計執行緒模型
Javascript中常見的非同步程式設計模型
2018-10-26
JavaScript非同步程式設計模型
SAP ABAP 平臺新的程式設計模型
2021-12-13
程式設計模型
好程式設計師大資料學習路線分享MAPREDUCE
2019-08-22
程式設計師大資料
函數語言程式設計-鏈式程式設計RAC
2018-09-05
函數程式設計
.NET併發程式設計-函數語言程式設計
2021-01-30
程式設計函數
併發程式設計模型小結
2018-09-27
程式設計模型
網路程式設計之IO模型
2023-12-18
程式設計模型
系統程式設計 - I/O模型
2020-10-11
程式設計模型
Hadoop之MapReduce2架構設計
2018-05-28
Hadoop架構
聊聊程式設計中的 “魔數”
2022-03-11
程式設計
重要 | Spark和MapReduce的對比，不僅僅是計算模型？
2020-11-04
Spark模型
非同步程式設計：.NET 4.5 基於任務的非同步程式設計模型(TAP)
2018-06-12
非同步程式設計模型
Java網路程式設計和NIO詳解3：IO模型與Java網路程式設計模型
2019-11-11
Java程式設計模型
Python函數語言程式設計系列002：水管模型和compose
2021-10-01
Python函數程式設計模型
函數語言程式設計 vs 物件導向程式設計 vs 程式式程式設計的JS演示比較 - DEV
2021-08-11
函數程式設計物件JSdev
好程式設計師大資料學習路線之mapreduce概述
2019-08-13
程式設計師大資料
好程式設計師大資料培訓教你快速學習MapReduce
2020-10-20
程式設計師大資料

mapreduce的程式設計模型，計數器

1.job = map+reduce

2.Map的輸出是reduce的輸入

3.所有的輸入和輸出都是<Key，Values>，一共4對。

4.K2=K3 V3是一個集合，該集合的每個元素就是V2。

5.所有的資料型別都必須是Hadoop自己的資料型別。

Submitter執行類：

Mapper類1：

Mapper類2：

Reduce類：

動態計數器：

相關文章