Flume-ng HDFS sink原理解析

boylook發表於2013-09-15

HDFS sink主要處理過程在process方法:

//迴圈batchSize次或者Channel為空

for (txnEventCount = 0; txnEventCount < batchSize; txnEventCount++) {

//該方法會呼叫BasicTransactionSemantics的具體實現

Event event = channel.take();

if (event == null) {

break;

}

......

//sfWriter是一個LRU快取,快取對檔案Handler,最大開啟檔案由引數maxopenfiles控制

BucketWriter bucketWriter = sfWriters.get(lookupPath);

// 如果不存在,則構造一個快取

if (bucketWriter == null) {

//透過HDFSWriterFactory根據filetype生成一個hdfswriter,由引數hdfs.Filetype控制;eg:HDFSDataStream

HDFSWriter hdfsWriter = writerFactory.getWriter(fileType);

//idleCallback會在bucketWriter flush完畢後從LRU中刪除;

bucketWriter = new BucketWriter(rollInterval, rollSize, rollCount,

batchSize, context, realPath, realName, inUsePrefix, inUseSuffix,

suffix, codeC, compType, hdfsWriter, timedRollerPool,

proxyTicket, sinkCounter, idleTimeout, idleCallback,

lookupPath, callTimeout, callTimeoutPool);

sfWriters.put(lookupPath, bucketWriter);

}

......

// track一個事務內的bucket

if (!writers.contains(bucketWriter)) {

writers.add(bucketWriter);

}

// 寫資料到HDFS

bucketWriter.append(event);->

open();//如果底層支援append,則透過open介面開啟;否則create介面

//判斷是否進行日誌切換

//根據複製的副本書和目標副本數做對比,如果不滿足則doRotate=false

if (doRotate) {

close();

open();

}

HDFSWriter.append(event);

if (batchCounter == batchSize) {//如果達到batchSize行進行一次flush

flush();->

doFlush()->

HDFSWriter.sync()->

FSDataoutputStream.flush/sync

}

// 提交事務之前,重新整理所有的bucket

for (BucketWriter bucketWriter : writers) {

bucketWriter.flush();

}

transaction.commit();

這裡,無論是BucketWriter執行appendsync還是rename等操作都是提交到一個後臺執行緒池進行非同步處理:callWithTimeout,這個執行緒池的大小是由hdfs.threadsize來設定;

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/11676357/viewspace-1060912/,如需轉載,請註明出處,否則將追究法律責任。

相關文章