小丸子學MongoDB系列之——副本集Auto-Failover

wxjzqym發表於2015-12-07
    MongoDB副本集是透過自動故障切換特性來提供的高可用功能,當主節點成員不可用時從節點成員會自動變成主節點來繼續對外提供服務,下面為大家演示兩種副本集架構下的自動故障切換。
一.Auto-Failover(一主倆從架構

1.檢視副本集狀態
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47017
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47017/test
rstl:PRIMARY> rs.status()
{
        "set" : "rstl",
        "date" : ISODate("2015-12-07T07:47:14.958Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.157.128:47017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 258664,
                        "optime" : Timestamp(1449459199, 2),
                        "optimeDate" : ISODate("2015-12-07T03:33:19Z"),
                        "electionTime" : Timestamp(1449458787, 1),
                        "electionDate" : ISODate("2015-12-07T03:26:27Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.157.128:47027",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 15651,
                        "optime" : Timestamp(1449459199, 2),
                        "optimeDate" : ISODate("2015-12-07T03:33:19Z"),
                        "lastHeartbeat" : ISODate("2015-12-07T07:47:14.804Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-07T07:47:14.212Z"),
                        "pingMs" : 0,
                        "syncingTo" : "192.168.157.128:47017",
                        "configVersion" : 1
                },
                {
                        "_id" : 2,
                        "name" : "192.168.157.128:47037",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 15651,
                        "optime" : Timestamp(1449459199, 2),
                        "optimeDate" : ISODate("2015-12-07T03:33:19Z"),
                        "lastHeartbeat" : ISODate("2015-12-07T07:47:14.696Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-07T07:47:14.509Z"),
                        "pingMs" : 0,
                        "syncingTo" : "192.168.157.128:47017",
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}


2.模擬主節點成員故障
[mgousr01@vm1 ~]$ netstat -ntpl|grep 47017
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.157.128:47017       0.0.0.0:*                   LISTEN      26892/mongod       
     
[mgousr01@vm1 ~]$ kill -9 26892
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47017
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47017/test
2015-12-04T14:44:59.102+0800 W NETWORK  Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:59.103+0800 E QUERY    Error: couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
    at connect (src/mongo/shell/mongo.js:181:14)
    at (connect):1:6 at src/mongo/shell/mongo.js:181
exception: connect failed

[mgousr01@vm1 ~]$ mongo 192.168.157.128:47027
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47027/test
rstl:PRIMARY> rs.status();
{
        "set" : "rstl",
        "date" : ISODate("2015-12-04T06:45:46.896Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.157.128:47017",
                        "health" : 0,
                        "state" : 8,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2015-12-04T06:45:46.244Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-04T06:44:17.958Z"),
                        "pingMs" : 0,
                        "lastHeartbeatMessage" : "Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed",
                        "configVersion" : -1
                },
                {
                        "_id" : 1,
                        "name" : "192.168.157.128:47027",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 653,
                        "optime" : Timestamp(1449210941, 1),
                        "optimeDate" : ISODate("2015-12-04T06:35:41Z"),
                        "electionTime" : Timestamp(1449211460, 1),
                        "electionDate" : ISODate("2015-12-04T06:44:20Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "192.168.157.128:47037",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 603,
                        "optime" : Timestamp(1449210941, 1),
                        "optimeDate" : ISODate("2015-12-04T06:35:41Z"),
                        "lastHeartbeat" : ISODate("2015-12-04T06:45:46.031Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-04T06:45:46.031Z"),
                        "pingMs" : 0,
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}


3.觀察一主倆從架構的Auto-Failover
成員mg27的日誌:
2015-12-04T14:44:19.979+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location10276 DBClientBase::findN: transport error: 192.168.157.128:47017 ns: admin.$cmd query: { replSetHeartbeat: "rstl", pv: 1, v: 1, from: "192.168.157.128:47027", fromId: 1, checkEmpty: false }
2015-12-04T14:44:19.985+0800 I REPL     [ReplicationExecutor] Standing for election
2015-12-04T14:44:19.985+0800 I REPL     [ReplicationExecutor] replSet possible election tie; sleeping 665ms until 2015-12-04T14:44:20.650+0800
2015-12-04T14:44:19.986+0800 W NETWORK  [ReplExecNetThread-3] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:19.986+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-04T14:44:19.987+0800 W NETWORK  [ReplExecNetThread-0] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:19.987+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-04T14:44:20.651+0800 I REPL     [ReplicationExecutor] Standing for election
2015-12-04T14:44:20.651+0800 I REPL     [ReplicationExecutor] replSet info electSelf
2015-12-04T14:44:20.653+0800 I REPL     [ReplicationExecutor] replSet election succeeded, assuming primary role
2015-12-04T14:44:20.653+0800 I REPL     [ReplicationExecutor] transition to PRIMARY
2015-12-04T14:44:21.094+0800 I REPL     [rsSync] transition to primary complete; database writes are now permitted

成員mg37的日誌:
2015-12-04T14:44:19.980+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location10276 DBClientBase::findN: transport error: 192.168.157.128:47017 ns: admin.$cmd query: { replSetHeartbeat: "rstl", pv: 1, v: 1, from: "192.168.157.128:47037", fromId: 2, checkEmpty: false }
2015-12-04T14:44:19.985+0800 I REPL     [ReplicationExecutor] Standing for election
2015-12-04T14:44:19.985+0800 W NETWORK  [ReplExecNetThread-1] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:19.986+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-04T14:44:19.986+0800 I REPL     [ReplicationExecutor] replSet possible election tie; sleeping 666ms until 2015-12-04T14:44:20.652+0800
2015-12-04T14:44:19.986+0800 W NETWORK  [ReplExecNetThread-1] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:19.986+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-04T14:44:20.652+0800 I REPL     [ReplicationExecutor] replSetElect voting yea for 192.168.157.128:47027 (1)
2015-12-04T14:44:21.963+0800 I REPL     [ReplicationExecutor] Member 192.168.157.128:47027 is now in state PRIMARY
注:從上述日誌可以發現兩個從節點成員透過心跳機制都無法向主節點發起請求,於是開始進行選舉,由於兩個候選成員都有可能成為主節點,所以選舉可能會出現打平的現象,最終成員mg27成為主節點成員。


4.驗證新的副本集資料同步是否正常
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47027
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47027/test
rstl:PRIMARY> use soho
switched to db soho
rstl:PRIMARY> db.food.find()
{ "_id" : ObjectId("5664fdfe1830846a3331ce02"), "name" : "egg", "price" : 38 }
rstl:PRIMARY> db.food.insert({name:"cake",price:100});
WriteResult({ "nInserted" : 1 })
rstl:PRIMARY> db.food.find()
{ "_id" : ObjectId("5664fdfe1830846a3331ce02"), "name" : "egg", "price" : 38 }
{ "_id" : ObjectId("56653f83e9fccdbf504e8548"), "name" : "cake", "price" : 100 }

[mgousr01@vm1 ~]$ mongo 192.168.157.128:47037
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47037/test
rstl:SECONDARY> rs.slaveOk()
rstl:SECONDARY> show dbs;
local  0.203GB
soho   0.078GB
rstl:SECONDARY> use soho
switched to db soho
rstl:SECONDARY> show tables;
food
system.indexes
rstl:SECONDARY> db.food.find();
{ "_id" : ObjectId("5664fdfe1830846a3331ce02"), "name" : "egg", "price" : 38 }
{ "_id" : ObjectId("56653f83e9fccdbf504e8548"), "name" : "cake", "price" : 100 }
注:從上面的輸出發現新的副本集沒有問題


二.Auto-Failover(一主一從一仲裁架構
1.初始化副本集並檢視狀態
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47017
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47017/test
> cfg=
{
    "_id" : "rstl",
    "version" : 1,
    "members" : [
         {
            "_id" : 0,
            "host" : "192.168.157.128:47017"
         },
         {
            "_id" : 1,
            "host" : "192.168.157.128:47027"
         },
         {
            "_id" : 2,
            "host" : "192.168.157.128:47037",
            "arbiterOnly":true 
         }
     ]
}
 
> rs.initiate(cfg)
{ "ok" : 1 }

rstl:OTHER> rs.status()
{
        "set" : "rstl",
        "date" : ISODate("2015-12-07T08:30:04.224Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.157.128:47017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 657,
                        "optime" : Timestamp(1449476903, 1),
                        "optimeDate" : ISODate("2015-12-07T08:28:23Z"),
                        "electionTime" : Timestamp(1449476907, 1),
                        "electionDate" : ISODate("2015-12-07T08:28:27Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.157.128:47027",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 100,
                        "optime" : Timestamp(1449476903, 1),
                        "optimeDate" : ISODate("2015-12-07T08:28:23Z"),
                        "lastHeartbeat" : ISODate("2015-12-07T08:30:03.484Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-07T08:30:03.518Z"),
                        "pingMs" : 0,
                        "configVersion" : 1
                },
                {
                        "_id" : 2,
                        "name" : "192.168.157.128:47037",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 100,
                        "lastHeartbeat" : ISODate("2015-12-07T08:30:03.523Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-07T08:30:03.523Z"),
                        "pingMs" : 0,
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}


2.模擬主節點成員故障
[mgousr01@vm1 ~]$ netstat -ntpl|grep 47017
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.157.128:47017       0.0.0.0:*                   LISTEN      36268/mongod     
     
[mgousr01@vm1 ~]$ kill -9 36268
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47017
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47017/test
2015-12-07T16:42:46.486+0800 W NETWORK  Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:46.488+0800 E QUERY    Error: couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
    at connect (src/mongo/shell/mongo.js:181:14)
    at (connect):1:6 at src/mongo/shell/mongo.js:181
exception: connect failed

[mgousr01@vm1 ~]$ mongo 192.168.157.128:47027
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47027/test
rstl:PRIMARY> rs.status()
{
        "set" : "rstl",
        "date" : ISODate("2015-12-07T08:43:02.047Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.157.128:47017",
                        "health" : 0,
                        "state" : 8,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2015-12-07T08:43:00.318Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-07T08:42:06.201Z"),
                        "pingMs" : 0,
                        "lastHeartbeatMessage" : "Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed",
                        "configVersion" : -1
                },
                {
                        "_id" : 1,
                        "name" : "192.168.157.128:47027",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 1336,
                        "optime" : Timestamp(1449476903, 1),
                        "optimeDate" : ISODate("2015-12-07T08:28:23Z"),
                        "electionTime" : Timestamp(1449477728, 1),
                        "electionDate" : ISODate("2015-12-07T08:42:08Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "192.168.157.128:47037",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 878,
                        "lastHeartbeat" : ISODate("2015-12-07T08:43:00.267Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-07T08:43:00.264Z"),
                        "pingMs" : 0,
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}


3.觀察一主一從一仲裁架構的Auto-Failover
成員mg27的日誌:
2015-12-07T16:42:08.218+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location10276 DBClientBase::findN: transport error: 192.168.157.128:47017 ns: admin.$cmd query: { replSetHeartbeat: "rstl", pv: 1, v: 1, from: "192.168.157.128:47027", fromId: 1, checkEmpty: false }
2015-12-07T16:42:08.218+0800 I REPL     [ReplicationExecutor] Standing for election
2015-12-07T16:42:08.219+0800 W NETWORK  [ReplExecNetThread-0] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:08.220+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-07T16:42:08.220+0800 I REPL     [ReplicationExecutor] replSet info electSelf
2015-12-07T16:42:08.220+0800 W NETWORK  [ReplExecNetThread-0] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:08.220+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-07T16:42:08.220+0800 I REPL     [ReplicationExecutor] replSet election succeeded, assuming primary role
2015-12-07T16:42:08.220+0800 I REPL     [ReplicationExecutor] transition to PRIMARY
2015-12-07T16:42:08.525+0800 I REPL     [rsSync] transition to primary complete; database writes are now permitted

成員mg37的日誌:
2015-12-07T16:42:08.217+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location10276 DBClientBase::findN: transport error: 192.168.157.128:47017 ns: admin.$cmd query: { replSetHeartbeat: "rstl", pv: 1, v: 1, from: "192.168.157.128:47037", fromId: 2, checkEmpty: false }
2015-12-07T16:42:08.218+0800 W NETWORK  [ReplExecNetThread-4] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:08.218+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-07T16:42:08.219+0800 W NETWORK  [ReplExecNetThread-4] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:08.219+0800 I NETWORK  [initandlisten] connection accepted from 192.168.157.128:39452 #59 (2 connections now open)
2015-12-07T16:42:08.219+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-07T16:42:08.220+0800 I REPL     [ReplicationExecutor] replSetElect voting yea for 192.168.157.128:47027 (1)
2015-12-07T16:42:10.220+0800 W NETWORK  [ReplExecNetThread-0] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:10.220+0800 I REPL     [ReplicationExecutor] Member 192.168.157.128:47027 is now in state PRIMARY
注:從上述日誌可以發現兩個從節點成員透過心跳機制都無法向主節點發起請求,於是開始進行選舉,從成員mg27的日誌並沒有發現一主兩從架構中的選舉爭用資訊,最終成員mg27成為主節點成員,此時成員mg27也獲得了資料庫寫的許可權


總結:對於三個節點的副本整合員來說,官方建議選擇1主倆從的架構,而仲裁節點一般適應於擁有偶數個副本整合員的情況下,這樣可以避免選舉打平的現象

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/20801486/viewspace-1867663/,如需轉載,請註明出處,否則將追究法律責任。

相關文章