MongoDB日常運維-05副本集故障切換

chenoracle發表於2020-03-22

MongoDB日常運維-05副本集故障切換

一:MongoDB常用命令彙總

二:MongoDB安裝

三:MongoDB主從複製搭建

四:MongoDB副本集搭建 

五:MongoDB副本集故障切換

六:MongoDB副本集搭建錯誤彙總


五:MongoDB副本集故障切換

預設情況下主節點和從節點的優先順序都為1,仲裁者為0,因為它不可參加選舉。

檢視叢集配置

cjcmonset:PRIMARY> rs.conf()

{

"_id" : "cjcmonset",

"version" : 1,

"protocolVersion" : NumberLong(1),

"writeConcernMajorityJournalDefault" : true,

"members" : [

{

"_id" : 0,

"host" : "192.168.2.222:27017",

"arbiterOnly" : false,

"buildIndexes" : true,

"hidden" : false,

"priority" : 1,

"tags" : {

},

"slaveDelay" : NumberLong(0),

"votes" : 1

},

{

"_id" : 1,

"host" : "192.168.2.187:27017",

"arbiterOnly" : false,

"buildIndexes" : true,

"hidden" : false,

"priority" : 1,

"tags" : {

},

"slaveDelay" : NumberLong(0),

"votes" : 1

},

{

"_id" : 2,

"host" : "192.168.2.188:27017",

"arbiterOnly" : true,

"buildIndexes" : true,

"hidden" : false,

"priority" : 0,

"tags" : {

},

"slaveDelay" : NumberLong(0),

"votes" : 1

}

],

"settings" : {

"chainingAllowed" : true,

"heartbeatIntervalMillis" : 2000,

"heartbeatTimeoutSecs" : 10,

"electionTimeoutMillis" : 10000,

"catchUpTimeoutMillis" : -1,

"catchUpTakeoverDelayMillis" : 30000,

"getLastErrorModes" : {

},

"getLastErrorDefaults" : {

"w" : 1,

"wtimeout" : 0

},

"replicaSetId" : ObjectId("5e77148837ae69b4ab9b4870")

}

}

我將現有主節點2.222的優先順序提高為5,目的是在主庫故障恢復後可以自動將主庫角色切換回來。

cjcmonset:PRIMARY> var rscfg=rs.conf()

cjcmonset:PRIMARY> rscfg.members[0].priority = 5

5

cjcmonset:PRIMARY> rs.reconfig(rscfg)

{

"ok" : 1,

"$clusterTime" : {

"clusterTime" : Timestamp(1584881617, 1),

"signature" : {

"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),

"keyId" : NumberLong(0)

}

},

"operationTime" : Timestamp(1584881617, 1)

}

檢視一下狀態,主庫優先順序別已經調到5

cjcmonset:PRIMARY> rs.conf()

{

"_id" : "cjcmonset",

"version" : 2,

"protocolVersion" : NumberLong(1),

"writeConcernMajorityJournalDefault" : true,

"members" : [

{

"_id" : 0,

"host" : "192.168.2.222:27017",

"arbiterOnly" : false,

"buildIndexes" : true,

"hidden" : false,

"priority" : 5,

"tags" : {

},

"slaveDelay" : NumberLong(0),

"votes" : 1

},

{

"_id" : 1,

"host" : "192.168.2.187:27017",

"arbiterOnly" : false,

"buildIndexes" : true,

"hidden" : false,

"priority" : 1,

"tags" : {

},

"slaveDelay" : NumberLong(0),

"votes" : 1

},

{

"_id" : 2,

"host" : "192.168.2.188:27017",

"arbiterOnly" : true,

"buildIndexes" : true,

"hidden" : false,

"priority" : 0,

"tags" : {

},

"slaveDelay" : NumberLong(0),

"votes" : 1

}

],

"settings" : {

"chainingAllowed" : true,

"heartbeatIntervalMillis" : 2000,

"heartbeatTimeoutSecs" : 10,

"electionTimeoutMillis" : 10000,

"catchUpTimeoutMillis" : -1,

"catchUpTakeoverDelayMillis" : 30000,

"getLastErrorModes" : {

},

"getLastErrorDefaults" : {

"w" : 1,

"wtimeout" : 0

},

"replicaSetId" : ObjectId("5e77148837ae69b4ab9b4870")

}

}

手動將將主節點(2.222)mongodo停掉,測試故障轉移功能

cjcmonset:PRIMARY> use admin

switched to db admin

cjcmonset:PRIMARY> db.shutdownServer()

2020-03-22T20:59:39.419+0800 I  NETWORK  [js] DBClientConnection failed to receive message from 127.0.0.1:27017 - HostUnreachable: Connection closed by peer

server should be down...

2020-03-22T20:59:39.422+0800 I  NETWORK  [js] trying reconnect to 127.0.0.1:27017 failed

2020-03-22T20:59:39.423+0800 I  NETWORK  [js] reconnect 127.0.0.1:27017 failed failed 

在2.187節點檢視叢集狀態,原主庫187提示Connection refused,原從庫2.187已經自動切換成主庫。

檢視叢集狀態

cjcmonset:PRIMARY> rs.status()

{

"set" : "cjcmonset",

"date" : ISODate("2020-03-22T13:00:33.838Z"),

"myState" : 1,

"term" : NumberLong(2),

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"heartbeatIntervalMillis" : NumberLong(2000),

"majorityVoteCount" : 2,

"writeMajorityCount" : 2,

"optimes" : {

"lastCommittedOpTime" : {

"ts" : Timestamp(1584881969, 1),

"t" : NumberLong(1)

},

"lastCommittedWallTime" : ISODate("2020-03-22T12:59:29.481Z"),

"readConcernMajorityOpTime" : {

"ts" : Timestamp(1584881969, 1),

"t" : NumberLong(1)

},

"readConcernMajorityWallTime" : ISODate("2020-03-22T12:59:29.481Z"),

"appliedOpTime" : {

"ts" : Timestamp(1584882028, 1),

"t" : NumberLong(2)

},

"durableOpTime" : {

"ts" : Timestamp(1584882028, 1),

"t" : NumberLong(2)

},

"lastAppliedWallTime" : ISODate("2020-03-22T13:00:28.344Z"),

"lastDurableWallTime" : ISODate("2020-03-22T13:00:28.344Z")

},

"lastStableRecoveryTimestamp" : Timestamp(1584881969, 1),

"lastStableCheckpointTimestamp" : Timestamp(1584881969, 1),

"electionCandidateMetrics" : {

"lastElectionReason" : "stepUpRequestSkipDryRun",

"lastElectionDate" : ISODate("2020-03-22T12:59:37.752Z"),

"electionTerm" : NumberLong(2),

"lastCommittedOpTimeAtElection" : {

"ts" : Timestamp(1584881969, 1),

"t" : NumberLong(1)

},

"lastSeenOpTimeAtElection" : {

"ts" : Timestamp(1584881969, 1),

"t" : NumberLong(1)

},

"numVotesNeeded" : 2,

"priorityAtElection" : 1,

"electionTimeoutMillis" : NumberLong(10000),

"priorPrimaryMemberId" : 0,

"numCatchUpOps" : NumberLong(0),

"newTermStartDate" : ISODate("2020-03-22T12:59:38.313Z")

},

"electionParticipantMetrics" : {

"votedForCandidate" : true,

"electionTerm" : NumberLong(1),

"lastVoteDate" : ISODate("2020-03-22T07:32:34.460Z"),

"electionCandidateMemberId" : 0,

"voteReason" : "",

"lastAppliedOpTimeAtElection" : {

"ts" : Timestamp(1584862345, 1),

"t" : NumberLong(-1)

},

"maxAppliedOpTimeInSet" : {

"ts" : Timestamp(1584862345, 1),

"t" : NumberLong(-1)

},

"priorityAtElection" : 1

},

"members" : [

{

"_id" : 0,

"name" : "192.168.2.222:27017",

"health" : 0,

"state" : 8,

"stateStr" : "(not reachable/healthy)",

"uptime" : 0,

"optime" : {

"ts" : Timestamp(0, 0),

"t" : NumberLong(-1)

},

"optimeDurable" : {

"ts" : Timestamp(0, 0),

"t" : NumberLong(-1)

},

"optimeDate" : ISODate("1970-01-01T00:00:00Z"),

"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),

"lastHeartbeat" : ISODate("2020-03-22T13:00:31.874Z"),

"lastHeartbeatRecv" : ISODate("2020-03-22T12:59:36.547Z"),

"pingMs" : NumberLong(0),

"lastHeartbeatMessage" : "Error connecting to 192.168.2.222:27017 :: caused by :: Connection refused",

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"infoMessage" : "",

"configVersion" : -1

},

{

"_id" : 1,

"name" : "192.168.2.187:27017",

"health" : 1,

"state" : 1,

"stateStr" : "PRIMARY",

"uptime" : 19745,

"optime" : {

"ts" : Timestamp(1584882028, 1),

"t" : NumberLong(2)

},

"optimeDate" : ISODate("2020-03-22T13:00:28Z"),

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"infoMessage" : "",

"electionTime" : Timestamp(1584881977, 1),

"electionDate" : ISODate("2020-03-22T12:59:37Z"),

"configVersion" : 2,

"self" : true,

"lastHeartbeatMessage" : ""

},

{

"_id" : 2,

"name" : "192.168.2.188:27017",

"health" : 1,

"state" : 7,

"stateStr" : "ARBITER",

"uptime" : 19689,

"lastHeartbeat" : ISODate("2020-03-22T13:00:31.872Z"),

"lastHeartbeatRecv" : ISODate("2020-03-22T13:00:32.657Z"),

"pingMs" : NumberLong(0),

"lastHeartbeatMessage" : "",

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"infoMessage" : "",

"configVersion" : 2

}

],

"ok" : 1,

"$clusterTime" : {

"clusterTime" : Timestamp(1584882028, 1),

"signature" : {

"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),

"keyId" : NumberLong(0)

}

},

"operationTime" : Timestamp(1584882028, 1)

}

手動啟動2.222節點MongoDB

[root@cjcos conf]# mongod --config /usr/local/mongodb/conf/mongodb.conf

cjcmonset:SECONDARY> rs.status()

{

"set" : "cjcmonset",

"date" : ISODate("2020-03-22T13:02:32.499Z"),

"myState" : 2,

"term" : NumberLong(2),

"syncingTo" : "192.168.2.187:27017",

"syncSourceHost" : "192.168.2.187:27017",

"syncSourceId" : 1,

"heartbeatIntervalMillis" : NumberLong(2000),

"majorityVoteCount" : 2,

"writeMajorityCount" : 2,

"optimes" : {

"lastCommittedOpTime" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"lastCommittedWallTime" : ISODate("2020-03-22T13:02:28.367Z"),

"readConcernMajorityOpTime" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"readConcernMajorityWallTime" : ISODate("2020-03-22T13:02:28.367Z"),

"appliedOpTime" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"durableOpTime" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"lastAppliedWallTime" : ISODate("2020-03-22T13:02:28.367Z"),

"lastDurableWallTime" : ISODate("2020-03-22T13:02:28.367Z")

},

"lastStableRecoveryTimestamp" : Timestamp(1584881969, 1),

"lastStableCheckpointTimestamp" : Timestamp(1584881969, 1),

"members" : [

{

"_id" : 0,

"name" : "192.168.2.222:27017",

"health" : 1,

"state" : 2,

"stateStr" : "SECONDARY",

"uptime" : 13,

"optime" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"optimeDate" : ISODate("2020-03-22T13:02:28Z"),

"syncingTo" : "192.168.2.187:27017",

"syncSourceHost" : "192.168.2.187:27017",

"syncSourceId" : 1,

"infoMessage" : "",

"configVersion" : 2,

"self" : true,

"lastHeartbeatMessage" : ""

},

{

"_id" : 1,

"name" : "192.168.2.187:27017",

"health" : 1,

"state" : 1,

"stateStr" : "PRIMARY",

"uptime" : 10,

"optime" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"optimeDurable" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"optimeDate" : ISODate("2020-03-22T13:02:28Z"),

"optimeDurableDate" : ISODate("2020-03-22T13:02:28Z"),

"lastHeartbeat" : ISODate("2020-03-22T13:02:31.498Z"),

"lastHeartbeatRecv" : ISODate("2020-03-22T13:02:31.261Z"),

"pingMs" : NumberLong(0),

"lastHeartbeatMessage" : "",

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"infoMessage" : "",

"electionTime" : Timestamp(1584881977, 1),

"electionDate" : ISODate("2020-03-22T12:59:37Z"),

"configVersion" : 2

},

{

"_id" : 2,

"name" : "192.168.2.188:27017",

"health" : 1,

"state" : 7,

"stateStr" : "ARBITER",

"uptime" : 10,

"lastHeartbeat" : ISODate("2020-03-22T13:02:31.496Z"),

"lastHeartbeatRecv" : ISODate("2020-03-22T13:02:32.014Z"),

"pingMs" : NumberLong(0),

"lastHeartbeatMessage" : "",

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"infoMessage" : "",

"configVersion" : 2

}

],

"ok" : 1,

"$clusterTime" : {

"clusterTime" : Timestamp(1584882148, 1),

"signature" : {

"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),

"keyId" : NumberLong(0)

}

},

"operationTime" : Timestamp(1584882148, 1)

}

因為設定了優先順序,所以在啟動2.222節點mongo後,很快2.222節點又會被重新選舉為主節點, 而2.187節點又變成從節點。

cjcmonset:SECONDARY> 

cjcmonset:PRIMARY> 

cjcmonset:PRIMARY> 

cjcmonset:PRIMARY> 

---2.222日誌如下:

2020-03-22T21:02:33.946+0800 I  CONNPOOL [Replication] Connecting to 192.168.2.187:27017

2020-03-22T21:02:33.949+0800 I  REPL     [replexec-0] Member 192.168.2.187:27017 is now in state SECONDARY

2020-03-22T21:02:33.949+0800 I  REPL     [replexec-0] Caught up to the latest optime known via heartbeats after becoming primary. Target optime: { ts: Timestamp(1584882148, 1), t: 2 }. My Last Applied: { ts: Timestamp(1584882148, 1), t: 2 }

2020-03-22T21:02:33.949+0800 I  REPL     [replexec-0] Exited primary catch-up mode.

2020-03-22T21:02:33.949+0800 I  REPL     [replexec-0] Stopping replication producer

2020-03-22T21:02:33.949+0800 I  REPL     [rsBackgroundSync] Replication producer stopped after oplog fetcher finished returning a batch from our sync source.  Abandoning this batch of oplog entries and re-evaluating our sync source.

2020-03-22T21:02:34.592+0800 I  REPL     [ReplBatcher] Oplog buffer has been drained in term 3

2020-03-22T21:02:34.592+0800 I  REPL     [RstlKillOpThread] Starting to kill user operations

2020-03-22T21:02:34.592+0800 I  REPL     [RstlKillOpThread] Stopped killing user operations

2020-03-22T21:02:34.592+0800 I  REPL     [RstlKillOpThread] State transition ops metrics: { lastStateTransition: "stepUp", userOpsKilled: 0, userOpsRunning: 0 }

2020-03-22T21:02:34.593+0800 I  REPL     [rsSync-0] transition to primary complete; database writes are now permitted

2020-03-22T21:02:34.712+0800 I  REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to 192.168.2.187:27017: InvalidSyncSource: Sync source was cleared. Was 192.168.2.187:27017

2020-03-22T21:02:35.459+0800 I  NETWORK  [listener] connection accepted from 192.168.2.187:41810 #13 (6 connections now open)

2020-03-22T21:02:35.460+0800 I  NETWORK  [conn13] received client metadata from 192.168.2.187:41810 conn13: { driver: { name: "NetworkInterfaceTL", version: "4.2.3" }, os: { type: "Linux", name: "CentOS Linux release 7.5.1804 (Core) ", architecture: "x86_64", version: "Kernel 3.10.0-862.el7.x86_64" } }

2020-03-22T21:02:39.711+0800 I  CONNPOOL [RS] Ending connection to host 192.168.2.187:27017 due to bad connection status: CallbackCanceled: Callback was canceled; 1 connections to that host remain open

2020-03-22T21:02:43.944+0800 I  CONNPOOL [Replication] Ending connection to host 192.168.2.187:27017 due to bad connection status: CallbackCanceled: Callback was canceled; 1 connections to that host remain open

檢視叢集狀態,2.187又重新變成主節點

cjcmonset:PRIMARY> rs.status()

{

"set" : "cjcmonset",

"date" : ISODate("2020-03-22T13:04:24.678Z"),

"myState" : 1,

"term" : NumberLong(3),

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"heartbeatIntervalMillis" : NumberLong(2000),

"majorityVoteCount" : 2,

"writeMajorityCount" : 2,

"optimes" : {

"lastCommittedOpTime" : {

"ts" : Timestamp(1584882264, 1),

"t" : NumberLong(3)

},

"lastCommittedWallTime" : ISODate("2020-03-22T13:04:24.632Z"),

"readConcernMajorityOpTime" : {

"ts" : Timestamp(1584882264, 1),

"t" : NumberLong(3)

},

"readConcernMajorityWallTime" : ISODate("2020-03-22T13:04:24.632Z"),

"appliedOpTime" : {

"ts" : Timestamp(1584882264, 1),

"t" : NumberLong(3)

},

"durableOpTime" : {

"ts" : Timestamp(1584882264, 1),

"t" : NumberLong(3)

},

"lastAppliedWallTime" : ISODate("2020-03-22T13:04:24.632Z"),

"lastDurableWallTime" : ISODate("2020-03-22T13:04:24.632Z")

},

"lastStableRecoveryTimestamp" : Timestamp(1584882254, 1),

"lastStableCheckpointTimestamp" : Timestamp(1584882254, 1),

"electionCandidateMetrics" : {

"lastElectionReason" : "priorityTakeover",

"lastElectionDate" : ISODate("2020-03-22T13:02:33.880Z"),

"electionTerm" : NumberLong(3),

"lastCommittedOpTimeAtElection" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"lastSeenOpTimeAtElection" : {

"ts" : Timestamp(1584882148, 1),

"t" : NumberLong(2)

},

"numVotesNeeded" : 2,

"priorityAtElection" : 5,

"electionTimeoutMillis" : NumberLong(10000),

"priorPrimaryMemberId" : 1,

"numCatchUpOps" : NumberLong(0),

"newTermStartDate" : ISODate("2020-03-22T13:02:34.593Z"),

"wMajorityWriteAvailabilityDate" : ISODate("2020-03-22T13:02:35.462Z")

},

"members" : [

{

"_id" : 0,

"name" : "192.168.2.222:27017",

"health" : 1,

"state" : 1,

"stateStr" : "PRIMARY",

"uptime" : 125,

"optime" : {

"ts" : Timestamp(1584882264, 1),

"t" : NumberLong(3)

},

"optimeDate" : ISODate("2020-03-22T13:04:24Z"),

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"infoMessage" : "",

"electionTime" : Timestamp(1584882153, 1),

"electionDate" : ISODate("2020-03-22T13:02:33Z"),

"configVersion" : 2,

"self" : true,

"lastHeartbeatMessage" : ""

},

{

"_id" : 1,

"name" : "192.168.2.187:27017",

"health" : 1,

"state" : 2,

"stateStr" : "SECONDARY",

"uptime" : 122,

"optime" : {

"ts" : Timestamp(1584882254, 1),

"t" : NumberLong(3)

},

"optimeDurable" : {

"ts" : Timestamp(1584882254, 1),

"t" : NumberLong(3)

},

"optimeDate" : ISODate("2020-03-22T13:04:14Z"),

"optimeDurableDate" : ISODate("2020-03-22T13:04:14Z"),

"lastHeartbeat" : ISODate("2020-03-22T13:04:24.023Z"),

"lastHeartbeatRecv" : ISODate("2020-03-22T13:04:23.967Z"),

"pingMs" : NumberLong(0),

"lastHeartbeatMessage" : "",

"syncingTo" : "192.168.2.222:27017",

"syncSourceHost" : "192.168.2.222:27017",

"syncSourceId" : 0,

"infoMessage" : "",

"configVersion" : 2

},

{

"_id" : 2,

"name" : "192.168.2.188:27017",

"health" : 1,

"state" : 7,

"stateStr" : "ARBITER",

"uptime" : 122,

"lastHeartbeat" : ISODate("2020-03-22T13:04:24.019Z"),

"lastHeartbeatRecv" : ISODate("2020-03-22T13:04:24.112Z"),

"pingMs" : NumberLong(0),

"lastHeartbeatMessage" : "",

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"infoMessage" : "",

"configVersion" : 2

}

],

"ok" : 1,

"$clusterTime" : {

"clusterTime" : Timestamp(1584882264, 1),

"signature" : {

"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),

"keyId" : NumberLong(0)

}

},

"operationTime" : Timestamp(1584882264, 1)

}

187節點自動變成從節點

cjcmonset:PRIMARY> 

cjcmonset:PRIMARY> 

cjcmonset:SECONDARY> 

cjcmonset:SECONDARY> 

歡迎關注我的微信公眾號"IT小Chen",共同學習,共同成長!!!  

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29785807/viewspace-2681969/,如需轉載,請註明出處,否則將追究法律責任。

相關文章