大資料學習筆記(十八)-MRUnit

狂暴棕熊發表於2018-01-05

MapReduce是跑在叢集上的,這就意味著MapReduce的作業很難被除錯。當然可以採用Log輸出的方式來檢視,但是這樣效率很低,也很難定位問題,因為要每次打包,上傳,執行jar包。
所以本地Debug單步除錯非常的重要,除錯的方法是使用MRUnit
在maven中新增MRUnit的依賴

<dependency>
    <groupId>org.apache.mrunit</groupId>
    <artifactId>mrunit</artifactId>
    <version>1.1.0</version>
    <classifier>hadoop2</classifier>
    <scope>test</scope>
</dependency>

使用MRUnit和使用JUnit的方式很類似,編寫測試類就可以了,給個例子,照搬就行了

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

/**
 * Created by yang.liu on 2018/1/5.
 */
public class MRUnitTest {
    MapDriver<LongWritable, Text, Text, LongWritable> mapDriver;
    ReduceDriver<Text, LongWritable, Text, LongWritable> reduceDriver;
    MapReduceDriver<LongWritable, Text, Text, LongWritable, Text, LongWritable> mapReduceDriver;

    @Before
    public void setUp(){
        WordCount.MyMapper mapper = new WordCount.MyMapper();
        mapDriver = MapDriver.newMapDriver(mapper);
        WordCount.MyReducer reducer = new WordCount.MyReducer();
        reduceDriver = ReduceDriver.newReduceDriver(reducer);
        mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper,reducer);
    }

    @Test
    public void testMapper() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text("input data");
        mapDriver.runTest();
    }
}

不過需要注意的是MapDriver,ReduceDriver和MapReduceDriver要和自己定義的MapReduce相一致,否則會出現編譯錯誤。
然後就可以打斷點除錯了。
PS:MapReduce都是對資料的處理,所以單元測試在MapReduce程式的開發中非常的有效。

相關文章