java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond
一:任務:
2015-08-27 10:23:04,392 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:05,393 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:06,393 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:15,483 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /Data/cloudera/hadoop/dfs/name/current/edits_inprogress_0000000000029706041 -> /Data/cloudera/hadoop/dfs/name/current/edits_0000000000029706041-0000000000029706137 2015-08-27 10:23:15,484 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 29706138 2015-08-27 10:23:16,485 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:17,486 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:18,486 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:19,487 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:20,488 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:21,486 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485] 2015-08-27 10:23:21,489 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:22,486 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485] 2015-08-27 10:23:22,490 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:23,337 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds 2015-08-27 10:23:23,487 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485] 2015-08-27 10:23:23,491 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:24,489 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485] 2015-08-27 10:23:24,492 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:25,490 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485] 2015-08-27 10:23:25,493 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: yhbd02/10.171.81.96:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-08-27 10:23:26,491 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:27,492 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:28,493 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:29,494 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:30,495 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:31,496 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16012 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:32,497 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:33,499 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18015 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:34,500 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for startLogSegment(29706138). Succeeded so far: [10.171.82.67:8485]. Exceptions so far: [10.171.81.96:8485: Call From yhbd01/10.171.82.67 to yhbd02:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: ] 2015-08-27 10:23:35,484 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: starting log segment 29706138 failed for required journal (JournalAndStream(mgr=QJM to [10.171.82.67:8485, 10.171.81.96:8485, 10.170.255.96:8485], stream=null)) java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:403) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:107) at org.apache.hadoop.hdfs.server.namenode.JournalSet$3.apply(JournalSet.java:222) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:219) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1181) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1150) at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1231) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6119) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:908) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 2015-08-27 10:23:35,486 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2015-08-27 10:23:35,490 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at yhbd01/10.171.82.67 ************************************************************/ 2015-08-27 10:28:48,352 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = yhbd01/10.171.82.67 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.5.0-cdh5.3.3 |
# There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 350748672 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # In 32 bit mode, the process size limit was hit # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full # Use 64 bit Java on a 64 bit OS # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize= # This output file may be truncated or incomplete. # # Out of Memory Error (os_linux.cpp:2809), pid=16666, tid=139882867275520 # # JRE version: (7.0_79-b14) (build ) # Java VM: OpenJDK 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 compressed oops) # Derivative: IcedTea 2.5.5 # Distribution: Built on CentOS release 6.6 (Final) (Wed Apr 15 00:02:21 UTC 2015) # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again |
點選(此處)摺疊或開啟
-
<property>
-
<name>mapreduce.reduce.shuffle.memory.limit.percent</name>
-
<value>0.13</value>
- </property>
#set mapred.min.split.size.per.rack=300000000;
後續解決方法:
1)調節journalnode 的寫入超時時間
1
|
dfs.qjournal.write-txns.timeout.ms
|
2)調整namenode 的java引數,提前觸發 full gc,這樣full gc 的時間就會小一些。
3)預設namenode的fullgc方式是parallel gc,是stw模式的,更改為cms的格式。調整namenode的啟動引數:
1
2
3
4
5
6
|
-XX:+UseCompressedOops
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled
-XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75
-XX:SoftRefLRUPolicyMSPerMB=0
|
點選(此處)摺疊或開啟
-
<property>
-
<name>dfs.qjournal.start-segment.timeout.ms</name>
-
<value>600000000</value>
-
</property>
-
-
<property>
-
<name>dfs.qjournal.prepare-recovery.timeout.ms</name>
-
<value>600000000</value>
-
</property>
-
-
<property>
-
<name>dfs.qjournal.accept-recovery.timeout.ms</name>
-
<value>600000000</value>
-
</property>
-
-
<property>
-
<name>dfs.qjournal.finalize-segment.timeout.ms</name>
-
<value>600000000</value>
-
</property>
-
-
<property>
-
<name>dfs.qjournal.select-input-streams.timeout.ms</name>
-
<value>600000000</value>
-
</property>
-
-
<property>
-
<name>dfs.qjournal.get-journal-state.timeout.ms</name>
-
<value>600000000</value>
-
</property>
-
-
<property>
-
<name>dfs.qjournal.new-epoch.timeout.ms</name>
-
<value>600000000</value>
-
</property>
-
-
<property>
-
<name>dfs.qjournal.write-txns.timeout.ms</name>
-
<value>600000000</value>
- </property>
Namenode Full GC過多,每次GC,系統停頓3-4分鐘
由於namenode儲存後設資料在記憶體,所以對老生代的壓力比較大,每次full gc時,系統就會卡死幾分鐘,解決方法如下:
(1). 使用cms gc演算法
(2). 修改新生代和老生代比例為1:2,如果是1:4,會造成大物件在做ygc時,大物件直接進入老生代,造成老生代記憶體快速增長,full gc更加頻繁。
其步驟可以歸類如下:
(1)、配置好HA後,客戶端所有的更新操作將會寫到JournalNodes節點的共享目錄中,可以透過下面配置
點選(此處)摺疊或開啟
-
<property>
-
<name>dfs.namenode.shared.edits.dir</name>
-
<value>qjournal://XXXX/mycluster</value>
-
</property>
-
-
<property>
-
<name>dfs.journalnode.edits.dir</name>
-
<value>/export1/hadoop2x/dfs/journal</value>
- </property>
(3)、Standby NameNode中的StandbyCheckpointer類會定期的檢查合併的條件是否成立,如果成立會合並fsimage和edits檔案;
(4)、Standby NameNode中的StandbyCheckpointer類合併完之後,將合併之後的fsimage上傳到Active NameNode相應目錄中;
(5)、Active NameNode接到最新的fsimage檔案之後,將舊的fsimage和edits檔案清理掉;
(6)、透過上面的幾步,fsimage和edits檔案就完成了合併,由於HA機制,會使得Standby NameNode和Active NameNode都擁有最新的fsimage和edits檔案(之前Hadoop 1.x的SecondaryNameNode中的fsimage和edits不是最新的)
調節journalnode 的寫入超時時間
2015-09-02 00:13:03,408 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 31631554 2015-09-02 00:13:04,166 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds 2015-09-02 00:13:04,166 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2015-09-02 00:13:34,166 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds 2015-09-02 00:13:34,167 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2015-09-02 00:14:04,166 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds 2015-09-02 00:14:04,167 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2015-09-02 00:14:34,166 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds 2015-09-02 00:14:34,167 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2015-09-02 00:15:04,049 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.171.81.96 2015-09-02 00:15:04,050 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2015-09-02 00:15:04,050 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 31631554 2015-09-02 00:15:04,050 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 76 Total time for transactions(ms): 6 Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 4 54 2015-09-02 00:15:04,121 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 76 Total time for transactions(ms): 6 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 73 56 2015-09-02 00:15:04,166 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds 2015-09-02 00:15:04,167 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /Data/cloudera/hadoop/dfs/name/current/edits_inprogress_0000000000031631554 -> /Data/cloudera/hadoop/dfs/name/current/edits_0000000000031631554-0000000000031631629 2015-09-02 00:15:04,167 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 31631630 2015-09-02 00:16:28,744 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 30 2015-09-02 00:16:28,944 WARN org.apache.hadoop.ipc.Server: IPC Server handler 6 on 8020, call org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 10.171.81.96:15527 Call#19 Retry#0: output error 2015-09-02 00:16:28,944 WARN org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 10.172.7.24:36833 Call#486892 Retry#0: output error 2015-09-02 00:16:28,944 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8020 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:474) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2534) at org.apache.hadoop.ipc.Server.access$1900(Server.java:130) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:965) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1030) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2068) 2015-09-02 00:16:28,944 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8020: skipped org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from 10.171.82.67:48509 Call#1099647 Retry#0 2015-09-02 00:16:28,944 WARN org.apache.hadoop.ipc.Server: IPC Server handler 5 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 10.165.115.190:56256 Call#464826 Retry#0: output error 2015-09-02 00:16:28,945 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 8020 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:474) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2534) at org.apache.hadoop.ipc.Server.access$1900(Server.java:130) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:965) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1030) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2068) 2015-09-02 00:16:28,944 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 8020 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:474) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2534) at org.apache.hadoop.ipc.Server.access$1900(Server.java:130) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:965) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1030) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2068) 2015-09-02 00:16:28,946 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8020: skipped org.apache.hadoop.ha.HAServiceProtocol.transitionToStandby from 10.171.81.96:15538 Call#1238456 Retry#0 2015-09-02 00:16:28,946 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8020: skipped org.apache.hadoop.ha.HAServiceProtocol.transitionToStandby from 10.171.81.96:15554 Call#1238459 Retry#0 2015-09-02 00:16:28,944 WARN org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 10.170.255.188:58665 Call#495767 Retry#0: output error 2015-09-02 00:16:28,946 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:474) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2534) at org.apache.hadoop.ipc.Server.access$1900(Server.java:130) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:965) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1030) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2068) 2015-09-02 00:16:28,947 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2015-09-02 00:16:28,946 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020: skipped org.apache.hadoop.ha.HAServiceProtocol.transitionToStandby from 10.171.81.96:15561 Call#1238462 Retry#0 2015-09-02 00:16:28,946 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 84780 millisecond(s). 2015-09-02 00:16:28,944 WARN org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 10.170.255.96:15952 Call#452032 Retry#0: output error 2015-09-02 00:16:28,948 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020 caught an exception |
點選(此處)摺疊或開啟
-
yarn-site.xml
-
<property>
-
<description>The minimum allocation for every container request at the RM,
-
in MBs. Memory requests lower than this won't take effect,
-
and the specified value will get allocated at minimum.
-
yarn.scheduler.minimum-allocation-mb
-
1500
-
-
-
-
The maximum allocation for every container request at the RM,
-
in MBs. Memory requests higher than this won't take effect,
-
and will get capped to this value.</description>
-
<name>yarn.scheduler.maximum-allocation-mb</name>
-
<value>8192</value>
-
</property>
-
-
mapred-site.xml
-
<property>
-
<name>mapreduce.map.memory.mb</name>
-
<value>1500</value>
-
</property>
-
-
<property>
-
<name>mapreduce.map.java.opts</name>
-
<value>-Xmx1024M</value>
-
<description>Larger heap-size for child jvms of maps.</description>
-
</property>
-
-
<property>
-
<name>mapreduce.reduce.memory.mb</name>
-
<value>2100</value>
-
</property>
-
-
<property>
-
<name>mapreduce.reduce.java.opts</name>
-
<value>-Xmx1900M</value>
-
<description>Larger heap-size for child jvms of reduces.</description>
-
</property>
-
-
<property>
-
<name>mapreduce.job.counters.limit</name>
-
<value>12000</value>
- </property>
最後總結一下:
轉:
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30089851/viewspace-2122226/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- oracle11gR2 Timed out waiting for the CRS stack to startOracleAI
- Failed to connect to ESP8266: Timed out waiting for packet headerAIHeader
- Detailed Item Cost Report (XML) timed out waiting for the Output Post-processor to finishAIXML
- Oracle RAC root.sh 報錯 Timed out waiting for the CRS stack to start 解決方法OracleAI
- grid安裝執行root.sh時Timed out waiting for the CRS stack to start - 解除安裝gridAI
- [Clickhouse] Clickhouse 報SQLException : Read timed outSQLException
- sqlnet.log error - Operation timed outSQLError
- ORA-13639: The CURRENT operation was interrupted because it timed OUT
- BLOCKED,WAITING,TIMED_WAITING有什麼區別?-用生活的例子解釋BloCAI
- oracle 11.2.0.3 grid ons 程式 checked timed outOracle
- 【Oracle】 inbound connection timed out (ORA-3136)Oracle
- pip安裝selenium報錯:Read timed out
- Android-問題-obtainBuffer timed out (is the CPU pegged?)AndroidAI
- 報錯(已解決)Command timed out after no timeout
- java.sql.SQLException:IO 錯誤:Socket read timed out !JavaSQLException
- Quorum FailGroupAI
- 執行systemctl status ssh返回“Failed to get properties: Connection timed out”AI
- 【解決】io.lettuce.core.RedisCommandTimeoutException: Command timed outRedisException
- 解決 connect to host github.com port 22 operation timed outGithub
- java.sql.SQLRecoverableException: IO Error: Socket read timed out 排查歷程JavaSQLExceptionError
- oozie.action.hadoop.LauncherException: IO error Connection timed out: no further informationHadoopExceptionErrorORM
- java.net.SocketTimeoutException: Read timed out異常解決方法JavaException
- WARNING: inbound connection timed out (ORA-3136)錯誤分析
- ORA-3136錯誤分析——WARNING Inbound Connection Timed Out
- io.lettuce.core.RedisCommandTimeoutException: Command timed out 解決辦法RedisException
- jdbc訪問KingbaseES資料庫SocketTimeoutException Read timed outJDBC資料庫Exception
- Feign,HTTP連線超時問題SocketTimeoutException: Read timed outHTTPException
- eclipse 線上安裝android包:Download interrupted: Read timed outEclipseAndroid
- kernel: ide-cd: cmd 0x1e timed out -- LinuxIDELinux
- 資料庫日誌中出現啟動JOB程式的TIMED OUT資訊資料庫
- RetryableException: Read timed out executing導致服務假死無響應Exception
- ORA-25254: time-out in LISTEN while waiting for a message 解決WhileAI
- Loadrunner 伺服器壓力上不去,客戶端出現大量timed out伺服器客戶端
- Request&Respond學習
- CM agent 傳送heartbeat給CM server 不斷髮生 connection refused:timed outServer
- open-falcon ---安裝Dashboard時候報錯"SSLError: The read operation timed out"Error
- adstrtal.sh報超時錯誤 ERROR : Timed out( 100000 ): Interrupted ExceptionErrorException
- WARNING: inbound connection timed out (ORA-3136)連線超時問題