MapReduce之自定義OutputFormat

孫晨c發表於2020-08-05

原文網址 : https://www.cnblogs.com/sunbr/p/13441144.html

ORM

OutputFormat介面實現類

OutputFormat是MapReduce輸出的基類，所有實現MapReduce輸出都實現了OutputFormat介面。下面介紹幾種常見的OutputFormat實現類。

文字輸出TextoutputFormat
預設的輸出格式是TextOutputFormat，它把每條記錄寫為文字行。它的鍵和值可以是任意型別，因為TextOutputFormat呼叫toString()方法把它們轉換為字串。
SequenceFileOutputFormat
將SecquenceFileOutputFormat輸出作為後續MapReduce任務的輸入，這便是一種好的輸出格式，因為它的格式緊湊，很容易被壓縮。
自定義OutputFormat
根據使用者需求，自定義實現輸出。

自定義OutputFormat使用場景及步驟

使用場景

為了實現控制最終檔案的輸出路徑和輸出格式，可以自定義OutputFormat。
例如：要在一個MapReduce程式中根據資料的不同輸出兩類結果到不同目錄，這類靈活的輸出需求可以通過自定義OutputFormat來實現。
自定義OutputFormat步驟
（1）自定義一個類繼承FileOutputFormat。
（2）改寫RecordWriter，具體改寫輸出資料的方法write()。

自定義OutputFormat 案例實操

需求
過濾輸入的log日誌，包含atguigu的網站輸出到e:/atguigu.log，不包含atguigu的網站輸出到e:/other.log。

輸入資料
在這裡插入圖片描述
什麼時候需要Reduce
①合併
②需要對資料排序

所以本案例不需要Reduce階段，key-value不需要實現序列化

CustomOFMapper.java

public class CustomOFMapper extends Mapper<LongWritable, Text, String, NullWritable>{
	
	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, String, NullWritable>.Context context) throws IOException, InterruptedException {
	
		String content = value.toString();
		//value不需要，但是不能用Null這個關鍵字，要使用NullWritable物件
		context.write(content+"\r\n", NullWritable.get());
	}

}

MyOutPutFormat.java

public class MyOutPutFormat extends FileOutputFormat<String, NullWritable>{

	@Override
	public RecordWriter<String, NullWritable> getRecordWriter(TaskAttemptContext job)
			throws IOException, InterruptedException {
		return new MyRecordWriter(job);//傳遞job物件，才能在RecordWriter中獲取配置
	}
	
}

MyRecordWriter.java

public class MyRecordWriter extends RecordWriter<String, NullWritable> {
	
	private Path atguiguPath=new Path("e:/atguigu.log");
	private Path otherPath=new Path("e:/other.log");
	
	private FSDataOutputStream atguguOS ;
	private FSDataOutputStream otherOS ;
	
	private FileSystem fs;
	
	private TaskAttemptContext context;

	public MyRecordWriter(TaskAttemptContext job) throws IOException {
		
			context=job;
		
			Configuration conf = job.getConfiguration();
			
			fs=FileSystem.get(conf);
			
			atguiguOS = fs.create(atguiguPath);
			otherOS = fs.create(otherPath);
	}
	

	// 將key-value寫出到檔案
	@Override
	public void write(String key, NullWritable value) throws IOException, InterruptedException {
		
		if (key.contains("atguigu")) {
			atguguOS.write(key.getBytes());//寫到atguigu.log
			//統計輸出的含有atguigu字串的key-value個數
			context.getCounter("MyCounter", "atguiguCounter").increment(1);
		}else {
			otherOS.write(key.getBytes());//寫到other.log
			context.getCounter("MyCounter", "otherCounter").increment(1);
		}
	}

	// 關閉流
	@Override
	public void close(TaskAttemptContext context) throws IOException, InterruptedException {
		
		if (atguguOS != null) {
			IOUtils.closeStream(atguguOS);
		}
		
		if (otherOS != null) {
			IOUtils.closeStream(otherOS);
		}
		
		if (fs != null) {
			fs.close();
		}
		
	}
}

CustomOFDriver.java

public class CustomOFDriver {
	
	public static void main(String[] args) throws Exception {
		
		Path inputPath=new Path("e:/mrinput/outputformat");
		Path outputPath=new Path("e:/mroutput/outputformat");

		//作為整個Job的配置
		Configuration conf = new Configuration();
		//保證輸出目錄不存在
		FileSystem fs=FileSystem.get(conf);
		
		if (fs.exists(outputPath)) {
			fs.delete(outputPath, true);
		}
		
		// ①建立Job
		Job job = Job.getInstance(conf);
		
		//重點，設定為自定義的輸出格式
		job.setJarByClass(CustomOFDriver.class);
		
		// ②設定Job
		// 設定Job執行的Mapper，Reducer型別，Mapper,Reducer輸出的key-value型別
		job.setMapperClass(CustomOFMapper.class);
		
		// 設定輸入目錄和輸出目錄
		FileInputFormat.setInputPaths(job, inputPath);
		FileOutputFormat.setOutputPath(job, outputPath);
		
		// 設定輸入和輸出格式
		job.setOutputFormatClass(MyOutPutFormat.class);
		
		// 取消reduce階段。設定為0，預設為1
		job.setNumReduceTasks(0);
		
		// ③執行Job
		job.waitForCompletion(true);
		
	}
}

輸出檔案：
在這裡插入圖片描述

MapReduce之自定義InputFormat
2020-07-19
ORM
MapReduce之自定義分割槽器Partitioner
2020-07-21
Hadoop（十九）MapReduce OutputFormat 資料壓縮
2024-09-19
HadoopORM
黑猴子的家：HBase 自定義HBase-MapReduce案列一
2018-10-05
自定義View之SwitchView
2018-09-19
View
Mybaitis之自定義TypeHandler
2019-04-25
AI
Java之自定義異常
2018-08-27
Java
Android 自定義 View 之 LeavesLoading
2019-02-01
AndroidView
NLog自定義Target之MQTT
2022-06-21
MQQT
自定義View事件之進階篇(四)-自定義Behavior實戰
2019-08-02
View事件
自定義View 之 RecyclerView.ItemDecoration
2019-03-02
View
Android自定義View之捲尺
2018-11-30
AndroidView
Preference元件探究之自定義Preference
2019-08-15
元件
RecyclerView之自定義LayoutManager和SnapHelper
2019-04-26
View
Flutter 之自定義控制元件
2021-03-22
Flutter控制元件
MapReduce之WritableComparable排序
2020-07-29
排序
Android 自定義 View 實戰之 PuzzleView
2019-02-27
AndroidView
Android 自定義 View 之入門篇
2019-04-18
AndroidView
Android自定義view之emoji鍵盤
2018-05-05
AndroidView
Android自定義View之Canvas的使用
2018-08-05
AndroidViewCanvas
玩轉docker之自定義PHP容器
2020-09-20
DockerPHP
PyQT5之自定義訊號
2024-06-08
QT
【朝花夕拾】Android自定義View篇之（四）自定義View的三種實現方式及自定義屬性詳解
2019-06-05
AndroidView
MapReduce實現與自定義詞典檔案基於hanLP的中文分詞詳解
2018-10-15
HanLP中文分詞
擴充spring元件之自定義標籤
2019-04-14
Spring元件
Android自定義View之定點寫文字
2018-07-31
AndroidView
15.prometheus之pushgateway自定義監控
2024-04-25
PrometheusGateway
Hexo 主題開發之自定義模板
2023-12-16
Hexo
造輪子之自定義授權策略
2023-10-09
小代學Spring Boot之自定義Starter
2019-07-23
Spring Boot
Flink的sink實戰之四：自定義
2020-11-11
BeetleX之webapi自定義響應內容
2020-10-13
WebAPI
xmake高階特性之自定義選項
2019-05-11
Spring Boot之自定義JSON轉換器
2019-04-28
Spring BootJSON
微信開發之自定義元件（Toast）
2018-04-23
元件AST
flutter系列之:在flutter中自定義themes
2023-03-06
Flutter
皕傑報表之自定義函式
2022-12-12
函式
Python 日誌列印之自定義logger handler
2021-01-24
Python

MapReduce之自定義OutputFormat

OutputFormat介面實現類

自定義OutputFormat使用場景及步驟

使用場景

自定義OutputFormat 案例實操

相關文章