MongoDB 副本集叢集從節點控制檯報錯10061:由於目標計算機積極拒絕,無法連線

清風艾艾發表於2018-01-08

    2018-01-05,mongo副本集叢集報錯10061:由於目標計算機積極拒絕,無法連線,問題處理過程彙總如下:

環境:
    作業系統:windows server 2008R2
    資料庫版本:mongodb 3.2.10
--------------------------------------------------------------------------------------------------------------------------------------------
    首先檢視叢集3個節點的控制檯日誌

1、叢集三臺伺服器控制檯日誌

192.168.72.33

2018-01-05T09:46:24.281+0800 I STORAGE [initandlisten] Placing a marker at optime Jan 05 05:16:28:3e9 
2018-01-05T09:46:24.432+0800 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker 
2018-01-05T09:46:24.432+0800 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory 'd:/mongodata/rs0-2/diagnostic.data' 
2018-01-05T09:46:24.443+0800 I NETWORK [initandlisten] waiting for connections on port 27013 
2018-01-05T09:46:25.485+0800 W NETWORK [ReplicationExecutor] Failed to connect 
to 192.168.72.31:27011, reason: errno:10061
由於目標計算機積極拒絕,無法連線。 
2018-01-05T09:46:25.533+0800 I REPL [ReplicationExecutor] New replica set co 
nfig in use: { _id: "rs0", version: 8, protocolVersion: 1, members: [ { _id: 0, 
host: "mongodb-rs0-0:27011", arbiterOnly: false, buildIndexes: true, hidden: fal 
se, priority: 100.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "mongo 
db-rs0-1:27012", arbiterOnly: false, buildIndexes: true, hidden: false, priority 
: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongodb-rs0-2:27013 
", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { 
}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatInte 
rvalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLas 
tErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: Obje 
ctId('59365592734d0747ee26e2a6') } } 
2018-01-05T09:46:25.534+0800 I REPL [ReplicationExecutor] This node is mongo db-rs0-2:27013 in the config


192.168.72.32

2018-01-05T09:46:17.064+0800 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker 
2018-01-05T09:46:17.064+0800 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory 'd:/mongodata/rs0-1/diagnostic.data' 
2018-01-05T09:46:17.076+0800 I NETWORK [initandlisten] waiting for connections on port 27012 
2018-01-05T09:46:18.102+0800 W NETWORK [ReplicationExecutor] Failed to connect 
to 192.168.72.31:27011, reason: errno:10061
由於目標計算機積極拒絕,無法連線。 
2018-01-05T09:46:19.149+0800 W NETWORK [ReplicationExecutor] Failed to connect 
to 192.168.72.33:27013, reason: errno:10061
由於目標計算機積極拒絕,無法連線。 
2018-01-05T09:46:19.150+0800 I REPL [ReplicationExecutor] New replica set co 
nfig in use: { _id: "rs0", version: 8, protocolVersion: 1, members: [ { _id: 0, 
host: "mongodb-rs0-0:27011", arbiterOnly: false, buildIndexes: true, hidden: fal 
se, priority: 100.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "mongo 
db-rs0-1:27012", arbiterOnly: false, buildIndexes: true, hidden: false, priority 
: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongodb-rs0-2:27013 
", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { 
}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatInte 
rvalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLas 
tErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: Obje 
ctId('59365592734d0747ee26e2a6') } } 
2018-01-05T09:46:19.150+0800 I REPL [ReplicationExecutor] This node is mongo db-rs0-1:27012 in the config


192.168.72.31

2018-01-05T15:56:42.999+0800 I STORAGE [initandlisten] Placing a marker at optime Jan 05 05:12:59:b4a 
2018-01-05T15:56:43.000+0800 I STORAGE [initandlisten] Placing a marker at optime Jan 05 05:13:08:8df 
2018-01-05T15:56:43.000+0800 I STORAGE [initandlisten] Placing a marker at optime Jan 05 05:14:05:329 
2018-01-05T15:56:43.001+0800 I STORAGE [initandlisten] Placing a marker at optime Jan 05 05:15:30:25f 
2018-01-05T15:56:43.002+0800 I STORAGE [initandlisten] Placing a marker at optime Jan 05 05:15:39:4b1

    根據以上日誌資訊推測:由於叢集主節點192.168.72.31發生儲存型別的等待事件,導致主節點192.168.72.31拒絕2個從節點192.168.72.32/33TCP連線

 

2、根據步驟1中的提示,檢視mongo服務在作業系統層次的日誌,作業系統日誌從2018-1-5 4:59:25秒就已經告警提示D盤已經滿載


3、檢視192.168.72.31儲存情況,果然如作業系統日誌提示,D盤只剩餘58MB的可用空間


4、由以上資訊可以斷定:由於Mongo叢集主節點192.168.72.31儲存空間滿,導致主節點192.168.72.31Mongo程式無法完成寫操作從而拒絕2個從節點的連線導致整個mongo叢集服務中斷。經溝通得知,地市技術對當前Mongo主節點192.168.72.31資料做了備份,沒有注意到D盤儲存情況。
事後,地市技術立即刪除節點192.168.72.31的冗餘資料備份釋放D盤空間,由於排程程式處於僵死狀態,地市技術決定重啟整個mongo叢集伺服器192.168.72.31/32/33。

 

5、重啟完成後,mongo叢集恢復正常,主節點192.168.72.31mongo控制檯提示排程程式bmi被接受連線到mongo叢集的admin

 

到此,本次湖北衛計委專案mongo叢集修復支援工作全部成功完成!


建議:

1、大資料操作前,地市技術一定要做資料量、伺服器磁碟預留空間的評估

2、資料變更操作過程中,密切監視資料庫、資料庫伺服器相關的表空間、cpu、記憶體、IO等資源的使用情況

3、資料變更操作後,統計資料庫、資料庫伺服器相關資源如表空間、磁碟空間的使用情況

4、地市技術最好每天上午上班、中午、下午下班定期檢視資料庫必須的資源如:磁碟空間、CPU、記憶體、磁碟IO、資料庫表空間的使用情況,有異常立即通知相關人員


 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29357786/viewspace-2149836/,如需轉載,請註明出處,否則將追究法律責任。

相關文章