Hadoop中hdfs和Hbase Xceivers設定

yezhibin發表於2014-01-14

近一段在研究Impala文件,在講述Parquest table分割槽效能調優方面提到了Xceivers設定。因此將該引數英文資料整理翻譯如下:

介紹

dfs.datanode.max.xcievers 引數對客戶端有直接影響,他主要定義server端的執行緒數量,或者更詳細說,資料連線的sockets。設定太小,當叢集擴充套件時候,無法充分利用資源。以下部分幫助我們理解客戶端和服務端工作機制,以及如何設定該引數大小

問題:

當該引數設定太小,導致較少的資源提供給HBASE,意味著Server和client連線可能出現IOExceptions,如在RegionServer中出現的錯誤日誌:
       

20xx-xx-xx 19:55:52,451 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream
20xx-xx-xx 19:55:52,451 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_-5467014108758633036_595771
20xx-xx-xx 19:55:58,455 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
20xx-xx-xx 19:55:58,455 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block blk_-5467014108758633036_595771 bad datanode[0]
20xx-xx-xx 19:55:58,482 FATAL org.apache.hadoop.hbase.regionserver.Flusher: Replay of hlog required. Forcing server shutdown

對應DataNode日誌中也出現如下資訊:
ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.10.10.53:50010,storageID=DS-1570581820-10.10.10.53-50010-1224117842339,infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
通常建議將該值從256提高到4096,我們仍想知道如何工作,HBASE使用這些資源做什麼。我們不能簡單的將該值設定很大。理由如下:

1、執行緒需要自己的堆疊,需要記憶體,預設每個執行緒是1MB,換句話說,設定4096,需要4GB的記憶體來容納他們,這影響了memstores,block cache和JVM,導致OutOfMemoryException錯誤。所以需要不能設定太大。

2、太多執行緒,將導致CPU負載過大,將引起許多上下文交換來處理並行工作,這將損耗實際執行的所需要的資源,所以需要合理的執行緒。

HDFS檔案系統細節


      客戶端,HDFS庫提供絕對呼叫路徑,被HADOOP支援的檔案系統類FileSystem class有一些implementation,其中一個DistributedFileSystem中DFSClient class,處理所有遠端伺服器互動。當客戶端,如HBASE,開啟一個檔案,他呼叫FileSystem class中的open()或create()方法。

  public DFSInputStream open(String src) throws IOException
  public FSDataOutputStream create(Path f) throws IOException

       返回的stream例項需要服務端的socket和執行緒,用於讀寫資料塊資料,DFSOutputStream 或 DFSInputStream class處理所有NameNode互動,算出拷貝的資料塊位置以及每個資料節點中每個資料塊的資料通訊。

       在服務端,資料節點 DataXceiverServer為實際的類,讀取以上配置值,當超過上限閥值將拋異常。當資料節點啟動,它建立一個執行緒組,啟動所談到的DataXceiverServer如下:


  this.threadGroup = new ThreadGroup(“dataXceiverServer”);
  this.dataXceiverServer = new Daemon(threadGroup,
      new DataXceiverServer(ss, conf, this));
  this.threadGroup.setDaemon(true); // auto destroy when empty 

DataXceiverServer 執行緒也線上程組內, 資料節點也有內部的類提取該組中活動的執行緒:

  /** Number of concurrent xceivers per node. */
  int getXceiverCount() {
    return threadGroup == null ? 0 : threadGroup.activeCount();
  }

       客戶端連線啟動讀寫資料塊,在握手完畢期間,一個執行緒被建立,註冊到以上的執行緒組,使得每個活動的新執行緒讀寫操作被跟蹤到服務端,如果線上程組中的執行緒數量超過配置,在資料節點日誌中也將丟擲異常記錄。

  if (curXceiverCount > dataXceiverServer.maxXceiverCount) {
    throw new IOException(“xceiverCount ” + curXceiverCount
                          + ” exceeds the limit of concurrent xcievers “
                          + dataXceiverServer.maxXceiverCount);
  }


客戶端處理


        客戶端如何通過服務端的執行緒讀寫操作,我們在DataXceiver類加debug資訊

  LOG.debug(“Number of active connections is: ” + datanode.getXceiverCount());
  …
  LOG.debug(datanode.dnRegistration + “:Number of active connections is: “     + datanode.getXceiverCount());

 下面圖示顯示 RegionServer的狀態:

 

最重要資訊是storefile=22,HBASE有許多檔案要處理,還包括write-ahead log,我們知道至少有22個活動連線。啟動HBASE,檢驗DataNode和RegionServer日誌資訊:

命令列:

$ bin/start-hbase.sh

DataNode Log:

2012-03-05 13:01:35,309 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 1
2012-03-05 13:01:35,315 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 2
12/03/05 13:01:35 INFO regionserver.MemStoreFlusher: globalMemStoreLimit=396.7m, globalMemStoreLimitLowMark=347.1m, maxHeap=991.7m
12/03/05 13:01:39 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 60030
2012-03-05 13:01:40,003 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 1
12/03/05 13:01:40 INFO regionserver.HRegionServer: Received request to open region: -ROOT-,,0.70236052
2012-03-05 13:01:40,882 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
2012-03-05 13:01:40,884 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4
2012-03-05 13:01:40,888 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3

12/03/05 13:01:40 INFO regionserver.HRegion: Onlined -ROOT-,,0.70236052; next sequenceid=63083
2012-03-05 13:01:40,982 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
2012-03-05 13:01:40,983 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4

12/03/05 13:01:41 INFO regionserver.HRegionServer: Received request to open region: .META.,,1.1028785192
2012-03-05 13:01:41,026 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
2012-03-05 13:01:41,027 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4

12/03/05 13:01:41 INFO regionserver.HRegion: Onlined .META.,,1.1028785192; next sequenceid=63082
2012-03-05 13:01:41,109 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
2012-03-05 13:01:41,114 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 4
2012-03-05 13:01:41,117 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 5
12/03/05 13:01:41 INFO regionserver.HRegionServer: Received request to open 16 region(s)
12/03/05 13:01:41 INFO regionserver.HRegionServer: Received request to open region: usertable,,1330944810191.62a312d67981c86c42b6bc02e6ec7e3f.
12/03/05 13:01:41 INFO regionserver.HRegionServer: Received request to open region: usertable,user1120311784,1330944810191.90d287473fe223f0ddc137020efda25d.

2012-03-05 13:01:41,246 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 13:01:41,248 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 7

2012-03-05 13:01:41,257 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 10
2012-03-05 13:01:41,257 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 9

12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user1120311784,1330944810191.90d287473fe223f0ddc137020efda25d.; next sequenceid=62917
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,,1330944810191.62a312d67981c86c42b6bc02e6ec7e3f.; next sequenceid=62916

12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user1361265841,1330944811370.80663fcf291e3ce00080599964f406ba.; next sequenceid=62919
2012-03-05 13:01:41,474 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 13:01:41,491 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 7
2012-03-05 13:01:41,495 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 8
2012-03-05 13:01:41,508 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 7

12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user1964968041,1330944848231.dd89596e9129e1caa7e07f8a491c9734.; next sequenceid=62920
2012-03-05 13:01:41,618 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 13:01:41,621 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 7

2012-03-05 13:01:41,829 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 7
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user515290649,1330944849739.d23924dc9e9d5891f332c337977af83d.; next sequenceid=62926
2012-03-05 13:01:41,832 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 13:01:41,838 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 7
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user757669512,1330944850808.cd0d6f16d8ae9cf0c9277f5d6c6c6b9f.; next sequenceid=62929

2012-03-05 14:01:39,711 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 4
2012-03-05 22:48:41,945 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4
12/03/05 22:48:41 INFO regionserver.HRegion: Onlined usertable,user757669512,1330944850808.cd0d6f16d8ae9cf0c9277f5d6c6c6b9f.; next sequenceid=62929
2012-03-05 22:48:41,963 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4

從以上的日誌可以看出,活動連線數從來沒有達到22,僅僅達到10,為什麼?為了明白這個,我們將看HDFS檔案如何對映到服務端DataXceiver的例項。

Hadoop深層挖掘

      DFSInputStream和DFSOutputStream是普通的stream內容,客戶端-服務端是標準的Java介面,內部路由到所選的DataNode,拷貝當前一個資料塊。根據需要開啟和關閉連線,當客戶端讀HDFS檔案,客戶端庫類透明從資料節點到資料節點切換,因此也是根據需要開啟個關閉連線。

      DFSInputStream有一個DFSClient.BlockReader類,開啟到DataNode的連線。每次read()呼叫blockSeekTo(),開啟連線,一旦完成讀,連線關閉。
DFSOutputStream有相似的類,跟蹤連線到伺服器,通過nextBlockOutputStream()啟動。

      以上兩個讀寫資料塊需要執行緒保留socket,依賴於客戶端做什麼,你將看到許多連線在當前HDFS檔案數量訪問數上下進行浮動。

回到上面HBASE:你沒有看到22個連線的原因是連線所需要的資料是HFILE的資料塊資訊,該資料塊能活動每個檔案的重要資訊,然後關閉連線,這意味著服務端的資源很快就釋放。保留四個連線很難確定,可以使用JStack dump DataNode所有執行緒,顯示如下:


“DataXceiver for client /127.0.0.1:64281 [sending block blk_5532741233443227208_4201]” daemon prio=5 tid=7fb96481d000 nid=0x1178b4000 runnable [1178b3000]
   java.lang.Thread.State: RUNNABLE
   …

“DataXceiver for client /127.0.0.1:64172 [receiving block blk_-2005512129579433420_4199 client=DFSClient_hb_rs_10.0.0.29,60020,1330984111693_1330984118810]” daemon prio=5 tid=7fb966109000 nid=0x1169cb000 runnable [1169ca000]
   java.lang.Thread.State: RUNNABLE
   …

      這僅僅是DataXceiver記錄,因此執行緒組的數量有點誤導。DataXceiverServer 後臺程式本身也被計算在內。和其他兩個活動連線,實際上是三個活動執行緒。日誌顯示4個執行緒,實際上市一個活動的執行緒log計數將完成,所以實際上是三個,與啟動時候3個執行緒匹配。
內部helper類,如PacketResponder佔用另一個執行緒,JSTACK輸出如下:


 ”PacketResponder 0 for Block blk_-2005512129579433420_4199″ daemon prio=5 tid=7fb96384d000 nid=0x116ace000 in Object.wait() [116acd000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder \
       .lastDataNodeRun(BlockReceiver.java:779)
     - locked (a org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder)
     at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:870)
     at java.lang.Thread.run(Thread.java:680)

        該執行緒狀態目前是TIMED_WAITING,這就是為什麼log中沒有包含這些執行緒。如果客戶端傳送資料,活動執行緒數將立馬上升,另一個注意地方是執行緒不需要另外的連線或者socket,PacketResponder僅僅是server端的一個執行緒,接收資料塊資料,在寫管道中,將資料傳送到另一個DataNode。
Hadoop fsck命令,報告目前開啟寫的檔案:


$ hadoop fsck /hbase -openforwrite
FSCK started by larsgeorge from /10.0.0.29 for path /hbase at Mon Mar 05 22:59:47 CET 2012
……/hbase/.logs/10.0.0.29,60020,1330984111693/10.0.0.29%3A60020.1330984118842 0 bytes, 1 block(s), OPENFORWRITE: ………………………………..Status: HEALTHY
 Total size:     2088783626 B
 Total dirs:     54
 Total files:    45
 …

他不會立馬佔用server端執行緒,通過block ID分配,開啟資料塊進行寫。命令將顯示實際檔案盒Block ID:

$ hadoop fsck /hbase -files -blocks
FSCK started by larsgeorge from /10.0.0.29 for path /hbase at Tue Mar 06 10:39:50 CET 2012


/hbase/.META./1028785192/.tmp


/hbase/.META./1028785192/info
/hbase/.META./1028785192/info/4027596949915293355 36517 bytes, 1 block(s):  OK
0. blk_5532741233443227208_4201 len=36517 repl=1


Status: HEALTHY
 Total size:     2088788703 B
 Total dirs:     54
 Total files:     45 (Files currently being written: 1)
 Total blocks (validated):     64 (avg. block size 32637323 B) (Total open file blocks (not validated): 1)
 Minimally replicated blocks:     64 (100.0 %)
 …

從輸出可以看到兩件事:
第一,在命令執行時,有一個檔案開啟,與-openforwriting匹配;
第二,block列表與檔案執行緒名匹配,例如Block_id=“blk_5532741233443227208_4201″ 從server端發到客戶端,該資料塊屬於HBASE .META表,在fsck命令中顯示,結合JStack和fsck可以取代lsof命令來檢視開啟檔案
JStack報告DataXceiver執行緒,在fsck命令沒有出現該資料塊列表。這是因為該資料塊沒有完成,所以沒有提供到輸出報告中,fsck僅僅報告完成資料塊

回到HBASE

在server端開啟所有region不需要很多資源,如果掃描整個HBASE表,強迫HBASE讀所有HFILE資料塊:
HBase Shell:

hbase(main):003:0> scan ‘usertable’

1000000 row(s) in 1460.3120 seconds

DataNode Log:

2012-03-05 14:42:20,580 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 14:43:23,293 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 7
2012-03-05 14:43:23,299 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 8

2012-03-05 14:49:24,332 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 11
2012-03-05 14:49:24,332 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 10
2012-03-05 14:49:59,987 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 11
2012-03-05 14:51:12,603 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 12
2012-03-05 14:51:12,605 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 11
2012-03-05 14:51:46,473 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 12

2012-03-05 14:56:59,420 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 15
2012-03-05 14:57:31,722 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 16
2012-03-05 14:58:24,909 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 17
2012-03-05 14:58:24,910 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 16

2012-03-05 15:04:17,688 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 21
2012-03-05 15:04:17,689 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 22
2012-03-05 15:04:54,545 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 21
2012-03-05 15:05:55,901 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 22
2012-03-05 15:05:55,901 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 21

活動連線數就基本達到22.

所有意味著什麼?

        因此需要多少xcievers?如果使用HBase,僅僅簡單監控storefiles+加上一些中間數量所需要使用比例+write ahead 日誌檔案。
以上例子輸出是在單節點執行,如果是叢集,需要將總的storefile除以DataNode,例如儲存檔案數量是1000,10個節點datanode,預設256個xcievier執行緒數是OK。

        最差情況是包含了所有讀和寫的數量,這很難事先確定,你也許很想建立十幾個保留。因為寫程式需要額外的,另外需要一些更短活動執行緒給PacketResponder,也不得不計算。因此合理的簡單公式如下:
     xceivers數量=(活動的寫數量*2+活動的讀數量)/Datanode數量

 

所以每個節點dfs.datanode.max.xcievers數量可以設定

對於純的HBASE設定,我們評估該公式如下:

因為一些引數很難估算,我們用最大值來代替,替換公式如下:

  

在預留20%的容量,最後公式如下:


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/354732/viewspace-1070376/,如需轉載,請註明出處,否則將追究法律責任。

相關文章