小丸子學MongoDB系列之——副本集在容災環境中的故障演練

wxjzqym發表於2015-12-23

假設MongoDB副本集部署在多機房環境中,下面模擬下機房故障發生時的現象已經處理過程。
0.架構規劃
主機名          IP               角色                 機房
hadoop1    10.1.245.72    primary          北京
hadoop2    10.1.245.73    secondary1    北京
hadoop3    10.1.245.74    secondary2    上海


1.檢視副本集狀態

[mgousr01@hadoop1 ~]$ mongo 10.1.245.72:37027
rstl:PRIMARY> rs.status()
{
        "set" : "rstl",
        "date" : ISODate("2015-12-23T03:26:38.917Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "10.1.245.72:37027",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 815,
                        "optime" : Timestamp(1450841104, 1),
                        "optimeDate" : ISODate("2015-12-23T03:25:04Z"),
                        "electionTime" : Timestamp(1450841106, 1),
                        "electionDate" : ISODate("2015-12-23T03:25:06Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "10.1.245.73:37037",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 94,
                        "optime" : Timestamp(1450841104, 1),
                        "optimeDate" : ISODate("2015-12-23T03:25:04Z"),
                        "lastHeartbeat" : ISODate("2015-12-23T03:26:38.312Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-23T03:26:38.188Z"),
                        "pingMs" : 0,
                        "configVersion" : 1
                },
                {
                        "_id" : 2,
                        "name" : "10.1.245.74:37047",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 94,
                        "optime" : Timestamp(1450841104, 1),
                        "optimeDate" : ISODate("2015-12-23T03:25:04Z"),
                        "lastHeartbeat" : ISODate("2015-12-23T03:26:38.312Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-23T03:26:38.238Z"),
                        "pingMs" : 0,
                        "lastHeartbeatMessage" : "could not find member to sync from",
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}



2.測試資料同步功能
2.1 primary執行寫入操作
rstl:PRIMARY> use test
switched to db test
rstl:PRIMARY> db.goods.insert({name:"eggs",price:38})
WriteResult({ "nInserted" : 1 })
rstl:PRIMARY> show collections;
goods
system.indexes
rstl:PRIMARY> db.goods.find();
{ "_id" : ObjectId("567a152c8d26acb133400410"), "name" : "eggs", "price" : 38 }


2.2 secondary1檢視同步資料
[mgousr01@hadoop2 ~]$ mongo 10.1.245.73:37037
rstl:SECONDARY> use test
switched to db test
rstl:SECONDARY>  db.goods.find();
{ "_id" : ObjectId("567a152c8d26acb133400410"), "name" : "eggs", "price" : 38 }


2.3 secondary2檢視同步資料
rstl:SECONDARY> use test
switched to db test
rstl:SECONDARY> db.goods.find();
{ "_id" : ObjectId("567a152c8d26acb133400410"), "name" : "eggs", "price" : 38 }
注:資料同步正常



3.模擬北京機房故障

3.1 同時將primary和secondary1上的mongod程式kill掉
[mgousr01@hadoop1 ~]$ ps -ef|grep mongod|grep -v grep
mgousr01 32555     1  0 11:13 ?        00:00:04 mongod -f mongodb/conf/mg72.conf
[mgousr01@hadoop1 ~]$ kill -9 32555

[mgousr01@hadoop2 ~]$ ps -ef|grep mongod |grep -v grep
mgousr01 15981     1  0 11:13 ?        00:00:05 mongod -f mongodb/conf/mg73.conf
[mgousr01@hadoop2 ~]$ kill -9 15981


3.2 secondary2上檢視當前副本集狀態
rstl:SECONDARY> rs.status()
{
        "set" : "rstl",
        "date" : ISODate("2015-12-23T03:36:08.448Z"),
        "myState" : 2,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "10.1.245.72:37027",
                        "health" : 0,
                        "state" : 8,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2015-12-23T03:36:07.019Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-23T03:34:51Z"),
                        "pingMs" : 0,
                        "lastHeartbeatMessage" : "Failed attempt to connect to 10.1.245.72:37027; couldn't connect to server 10.1.245.72:37027 (10.1.245.72), connection attempt failed",                         "configVersion" : -1
                },
                {
                        "_id" : 1,
                        "name" : "10.1.245.73:37037",
                        "health" : 0,
                        "state" : 8,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2015-12-23T03:36:07.021Z"),
                        "lastHeartbeatRecv" : ISODate("2015-12-23T03:34:52.948Z"),
                        "pingMs" : 0,
                        "lastHeartbeatMessage" : "Failed attempt to connect to 10.1.245.73:37037; couldn't connect to server 10.1.245.73:37037 (10.1.245.73), connection attempt failed",
                        "configVersion" : -1
                },
                {
                        "_id" : 2,
                        "name" : "10.1.245.74:37047",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 1345,
                        "optime" : Timestamp(1450841388, 2),
                        "optimeDate" : ISODate("2015-12-23T03:29:48Z"),
                        "configVersion" : 1,
                        "self" : true
                }
        ],
        "ok" : 1
}
rstl:SECONDARY> db.goods.find();
{ "_id" : ObjectId("567a152c8d26acb133400410"), "name" : "eggs", "price" : 38 }
注:目前已經無法連線到primary和secondary1,當前還可以查詢資料

rstl:SECONDARY> db.goods.insert({name:"cake",price:100});
WriteResult({ "writeError" : { "code" : undefined, "errmsg" : "not master" } })
注:secondary成員是無法接受寫入請求的


3.3 secondary2上重新初始化副本集配置
rstl:SECONDARY> cfg = rs.conf()
rstl:SECONDARY> printjson(cfg)                                --備份當前配置

rstl:SECONDARY> cfg.members = [cfg.members[2]]   --在配置中刪除不可用成員

rstl:SECONDARY> rs.reconfig(cfg, {force : true})        --讓新配置重新生效
{ "ok" : 1 }
rstl:PRIMARY>
注:這個時候提示符已經從"SECONDARY"變成了"PRIMARY"

rstl:PRIMARY> rs.status();                                            --檢視當前副本集狀態
{
        "set" : "rstl",
        "date" : ISODate("2015-12-23T05:49:39.772Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 2,
                        "name" : "10.1.245.74:37047",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 9356,
                        "optime" : Timestamp(1450841388, 2),
                        "optimeDate" : ISODate("2015-12-23T03:29:48Z"),
                        "electionTime" : Timestamp(1450849750, 1),
                        "electionDate" : ISODate("2015-12-23T05:49:10Z"),
                        "configVersion" : 21037,
                        "self" : true
                }
        ],
        "ok" : 1
}

rstl:PRIMARY> db.goods.insert({name:"cake",price:100}); 
WriteResult({ "nInserted" : 1 })
rstl:PRIMARY> db.goods.find();
{ "_id" : ObjectId("567a152c8d26acb133400410"), "name" : "eggs", "price" : 38 }
{ "_id" : ObjectId("567a361bb72708516df2cb36"), "name" : "cake", "price" : 100 }

至此,新的primary配置成功並能夠正常接收寫入請求。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/20801486/viewspace-1878235/,如需轉載,請註明出處,否則將追究法律責任。

相關文章