在上一篇文章《搭建高可用MongoDB叢集(一)——配置MongoDB》 提到了幾個問題還沒有解決。
- 主節點掛了能否自動切換連線?目前需要手工切換。
- 主節點的讀寫壓力過大如何解決?
- 從節點每個上面的資料都是對資料庫全量拷貝,從節點壓力會不會過大?
- 資料壓力大到機器支撐不了的時候能否做到自動擴充套件?
這篇文章看完這些問題就可以搞定了。NoSQL的產生就是為了解決大資料量、高擴充套件性、高效能、靈活資料模型、高可用性。但是光通過主從模式的架構遠遠達不到上面幾點,由此MongoDB設計了副本集和分片的功能。這篇文章主要介紹副本集:
mongoDB官方已經不建議使用主從模式了,替代方案是採用副本集的模式,點選檢視 ,如圖:
那什麼是副本集呢?打魔獸世界總說打副本,其實這兩個概念差不多一個意思。遊戲裡的副本是指玩家集中在高峰時間去一個場景打怪,會出現玩家暴多怪物少的情況,遊戲開發商為了保證玩家的體驗度,就為每一批玩家單獨開放一個同樣的空間同樣的數量的怪物,這一個複製的場景就是一個副本,不管有多少個玩家各自在各自的副本里玩不會互相影響。 mongoDB的副本也是這個,主從模式其實就是一個單副本的應用,沒有很好的擴充套件性和容錯性。而副本集具有多個副本保證了容錯性,就算一個副本掛掉了還有很多副本存在,並且解決了上面第一個問題“主節點掛掉了,整個叢集內會自動切換”。難怪mongoDB官方推薦使用這種模式。我們來看看mongoDB副本集的架構圖:
由圖可以看到客戶端連線到整個副本集,不關心具體哪一臺機器是否掛掉。主伺服器負責整個副本集的讀寫,副本集定期同步資料備份,一但主節點掛掉,副本節點就會選舉一個新的主伺服器,這一切對於應用伺服器不需要關心。我們看一下主伺服器掛掉後的架構:
副本集中的副本節點在主節點掛掉後通過心跳機制檢測到後,就會在叢集內發起主節點的選舉機制,自動選舉一位新的主伺服器。看起來很牛X的樣子,我們趕緊操作部署一下!
官方推薦的副本集機器數量為至少3個,那我們也按照這個數量配置測試。
1、準備兩臺機器 192.168.1.136、192.168.1.137、192.168.1.138。 192.168.1.136 當作副本集主節點,192.168.1.137、192.168.1.138作為副本集副本節點。
2、分別在每臺機器上建立mongodb副本集測試資料夾
1 2 3 4 5 6 7 8 |
#存放整個mongodb檔案 mkdir -p /data/mongodbtest/replset #存放mongodb資料檔案 mkdir -p /data/mongodbtest/replset/data #進入mongodb資料夾 cd /data/mongodbtest |
3、下載mongodb的安裝程式包
1
|
wget http: //fastdl .mongodb.org /linux/mongodb-linux-x86_64-2 .4.8.tgz |
注意linux生產環境不能安裝32位的mongodb,因為32位受限於作業系統最大2G的檔案限制。
1 2 |
#解壓下載的壓縮包 tar xvzf mongodb-linux-x86_64-2.4.8.tgz |
4、分別在每臺機器上啟動mongodb
1 |
/data/mongodbtest/mongodb-linux-x86_64-2.4.8/bin/mongod --dbpath /data/mongodbtest/replset/data --replSet repset |
可以看到控制檯上顯示副本集還沒有配置初始化資訊。
1 2 |
Sun Dec 29 20:12:02.953 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG) Sun Dec 29 20:12:02.953 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done |
5、初始化副本集
在三臺機器上任意一臺機器登陸mongodb
1 2 3 4 |
/data/mongodbtest/mongodb-linux-x86_64-2.4.8/bin/mongo #使用admin資料庫 use admin |
#定義副本集配置變數,這裡的 _id:”repset” 和上面命令引數“ –replSet repset” 要保持一樣。
1 2 3 4 5 |
config = { _id:"repset", members:[ ... {_id:0,host:"192.168.1.136:27017"}, ... {_id:1,host:"192.168.1.137:27017"}, ... {_id:2,host:"192.168.1.138:27017"}] ... } |
#輸出
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
{ "_id" : "repset", "members" : [ { "_id" : 0, "host" : "192.168.1.136:27017" }, { "_id" : 1, "host" : "192.168.1.137:27017" }, { "_id" : 2, "host" : "192.168.1.138:27017" } ] } |
1 2 |
#初始化副本集配置 rs.initiate(config); |
#輸出成功
1 2 3 4 |
{ "info" : "Config now saved locally. Should come online in about a minute.", "ok" : 1 } |
#檢視日誌,副本集啟動成功後,138為主節點PRIMARY,136、137為副本節點SECONDARY。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
Sun Dec 29 20:26:13.842 [conn3] replSet replSetInitiate admin command received from client Sun Dec 29 20:26:13.842 [conn3] replSet replSetInitiate config object parses ok, 3 members specified Sun Dec 29 20:26:13.847 [conn3] replSet replSetInitiate all members seem up Sun Dec 29 20:26:13.848 [conn3] ****** Sun Dec 29 20:26:13.848 [conn3] creating replication oplog of size: 990MB... Sun Dec 29 20:26:13.849 [FileAllocator] allocating new datafile /data/mongodbtest/replset/data/local.1, filling with zeroes... Sun Dec 29 20:26:13.862 [FileAllocator] done allocating datafile /data/mongodbtest/replset/data/local.1, size: 1024MB, took 0.012 secs Sun Dec 29 20:26:13.863 [conn3] ****** Sun Dec 29 20:26:13.863 [conn3] replSet info saving a newer config version to local.system.replset Sun Dec 29 20:26:13.864 [conn3] replSet saveConfigLocally done Sun Dec 29 20:26:13.864 [conn3] replSet replSetInitiate config now saved locally. Should come online in about a minute. Sun Dec 29 20:26:23.047 [rsStart] replSet I am 192.168.1.138:27017 Sun Dec 29 20:26:23.048 [rsStart] replSet STARTUP2 Sun Dec 29 20:26:23.049 [rsHealthPoll] replSet member 192.168.1.137:27017 is up Sun Dec 29 20:26:23.049 [rsHealthPoll] replSet member 192.168.1.136:27017 is up Sun Dec 29 20:26:24.051 [rsSync] replSet SECONDARY Sun Dec 29 20:26:25.053 [rsHealthPoll] replset info 192.168.1.136:27017 thinks that we are down Sun Dec 29 20:26:25.053 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state STARTUP2 Sun Dec 29 20:26:25.056 [rsMgr] not electing self, 192.168.1.136:27017 would veto with 'I don't think 192.168.1.138:27017 is electable' Sun Dec 29 20:26:31.059 [rsHealthPoll] replset info 192.168.1.137:27017 thinks that we are down Sun Dec 29 20:26:31.059 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state STARTUP2 Sun Dec 29 20:26:31.062 [rsMgr] not electing self, 192.168.1.137:27017 would veto with 'I don't think 192.168.1.138:27017 is electable' Sun Dec 29 20:26:37.074 [rsMgr] replSet info electSelf 2 Sun Dec 29 20:26:38.062 [rsMgr] replSet PRIMARY Sun Dec 29 20:26:39.071 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state RECOVERING Sun Dec 29 20:26:39.075 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state RECOVERING Sun Dec 29 20:26:42.201 [slaveTracking] build index local.slaves { _id: 1 } Sun Dec 29 20:26:42.207 [slaveTracking] build index done. scanned 0 total records. 0.005 secs Sun Dec 29 20:26:43.079 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state SECONDARY Sun Dec 29 20:26:49.080 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state SECONDARY |
1 2 |
#檢視叢集節點的狀態 rs.status(); |
#輸出
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
{ "set" : "repset", "date" : ISODate("2013-12-29T12:54:25Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "192.168.1.136:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 1682, "optime" : Timestamp(1388319973, 1), "optimeDate" : ISODate("2013-12-29T12:26:13Z"), "lastHeartbeat" : ISODate("2013-12-29T12:54:25Z"), "lastHeartbeatRecv" : ISODate("2013-12-29T12:54:24Z"), "pingMs" : 1, "syncingTo" : "192.168.1.138:27017" }, { "_id" : 1, "name" : "192.168.1.137:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 1682, "optime" : Timestamp(1388319973, 1), "optimeDate" : ISODate("2013-12-29T12:26:13Z"), "lastHeartbeat" : ISODate("2013-12-29T12:54:25Z"), "lastHeartbeatRecv" : ISODate("2013-12-29T12:54:24Z"), "pingMs" : 1, "syncingTo" : "192.168.1.138:27017" }, { "_id" : 2, "name" : "192.168.1.138:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 2543, "optime" : Timestamp(1388319973, 1), "optimeDate" : ISODate("2013-12-29T12:26:13Z"), "self" : true } ], "ok" : 1 } |
整個副本集已經搭建成功了。
6、測試副本集資料複製功能
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
#在主節點192.168.1.138 上連線到終端: mongo 127.0.0.1 #建立test 資料庫。 use test; 往testdb表插入資料。 > db.testdb.insert({"test1":"testval1"}) #在副本節點 192.168.1.136、192.168.1.137 上連線到mongodb檢視資料是否複製過來。 /data/mongodbtest/mongodb-linux-x86_64-2.4.8/bin/mongo 192.168.1.136:27017 #使用test 資料庫。 repset:SECONDARY> use test; repset:SECONDARY> show tables; |
#輸出
1 |
Sun Dec 29 21:50:48.590 error: { "$err" : "not master and slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:128 |
1 2 3 4 5 6 7 8 9 |
#mongodb預設是從主節點讀寫資料的,副本節點上不允許讀,需要設定副本節點可以讀。 repset:SECONDARY> db.getMongo().setSlaveOk(); #可以看到資料已經複製到了副本集。 repset:SECONDARY> db.testdb.find(); 1 2 #輸出 { "_id" : ObjectId("52c028460c7505626a93944f"), "test1" : "testval1" } |
7、測試副本集故障轉移功能
先停掉主節點mongodb 138,檢視136、137的日誌可以看到經過一系列的投票選擇操作,137 當選主節點,136從137同步資料過來。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Sun Dec 29 22:03:05.351 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: 192.168.1.138:27017 Sun Dec 29 22:03:05.354 [rsBackgroundSync] replSet syncing to: 192.168.1.138:27017 Sun Dec 29 22:03:05.356 [rsBackgroundSync] repl: couldn't connect to server 192.168.1.138:27017 Sun Dec 29 22:03:05.356 [rsBackgroundSync] replSet not trying to sync from 192.168.1.138:27017, it is vetoed for 10 more seconds Sun Dec 29 22:03:05.499 [rsHealthPoll] DBClientCursor::init call() failed Sun Dec 29 22:03:05.499 [rsHealthPoll] replset info 192.168.1.138:27017 heartbeat failed, retrying Sun Dec 29 22:03:05.501 [rsHealthPoll] replSet info 192.168.1.138:27017 is down (or slow to respond): Sun Dec 29 22:03:05.501 [rsHealthPoll] replSet member 192.168.1.138:27017 is now in state DOWN Sun Dec 29 22:03:05.511 [rsMgr] not electing self, 192.168.1.137:27017 would veto with '192.168.1.136:27017 is trying to elect itself but 192.168.1.138:27017 is already primary and more up-to-date' Sun Dec 29 22:03:07.330 [conn393] replSet info voting yea for 192.168.1.137:27017 (1) Sun Dec 29 22:03:07.503 [rsHealthPoll] replset info 192.168.1.138:27017 heartbeat failed, retrying Sun Dec 29 22:03:08.462 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state PRIMARY Sun Dec 29 22:03:09.359 [rsBackgroundSync] replSet syncing to: 192.168.1.137:27017 Sun Dec 29 22:03:09.507 [rsHealthPoll] replset info 192.168.1.138:27017 heartbeat failed, retrying |
檢視整個叢集的狀態,可以看到138為狀態不可達。
1 2 3 |
/data/mongodbtest/mongodb-linux-x86_64-2.4.8/bin/mongo 192.168.1.136:27017 repset:SECONDARY> rs.status(); |
#輸出
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
{ "set" : "repset", "date" : ISODate("2013-12-29T14:28:35Z"), "myState" : 2, "syncingTo" : "192.168.1.137:27017", "members" : [ { "_id" : 0, "name" : "192.168.1.136:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 9072, "optime" : Timestamp(1388324934, 1), "optimeDate" : ISODate("2013-12-29T13:48:54Z"), "self" : true }, { "_id" : 1, "name" : "192.168.1.137:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 7329, "optime" : Timestamp(1388324934, 1), "optimeDate" : ISODate("2013-12-29T13:48:54Z"), "lastHeartbeat" : ISODate("2013-12-29T14:28:34Z"), "lastHeartbeatRecv" : ISODate("2013-12-29T14:28:34Z"), "pingMs" : 1, "syncingTo" : "192.168.1.138:27017" }, { "_id" : 2, "name" : "192.168.1.138:27017", "health" : 0, "state" : 8, "stateStr" : "(not reachable/healthy)", "uptime" : 0, "optime" : Timestamp(1388324934, 1), "optimeDate" : ISODate("2013-12-29T13:48:54Z"), "lastHeartbeat" : ISODate("2013-12-29T14:28:35Z"), "lastHeartbeatRecv" : ISODate("2013-12-29T14:28:23Z"), "pingMs" : 0, "syncingTo" : "192.168.1.137:27017" } ], "ok" : 1 } |
再啟動原來的主節點 138,發現138 變為 SECONDARY,還是137 為主節點 PRIMARY。
1 2 3 4 5 6 7 8 |
Sun Dec 29 22:21:06.619 [rsStart] replSet I am 192.168.1.138:27017 Sun Dec 29 22:21:06.619 [rsStart] replSet STARTUP2 Sun Dec 29 22:21:06.627 [rsHealthPoll] replset info 192.168.1.136:27017 thinks that we are down Sun Dec 29 22:21:06.627 [rsHealthPoll] replSet member 192.168.1.136:27017 is up Sun Dec 29 22:21:06.627 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state SECONDARY Sun Dec 29 22:21:07.628 [rsSync] replSet SECONDARY Sun Dec 29 22:21:08.623 [rsHealthPoll] replSet member 192.168.1.137:27017 is up Sun Dec 29 22:21:08.624 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state PRIMARY |
8、java程式連線副本集測試。三個節點有一個節點掛掉也不會影響應用程式客戶端對整個副本集的讀寫!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
public class TestMongoDBReplSet { public static void main(String[] args) { try { List<ServerAddress> addresses = new ArrayList<ServerAddress>(); ServerAddress address1 = new ServerAddress("192.168.1.136" , 27017); ServerAddress address2 = new ServerAddress("192.168.1.137" , 27017); ServerAddress address3 = new ServerAddress("192.168.1.138" , 27017); addresses.add(address1); addresses.add(address2); addresses.add(address3); MongoClient client = new MongoClient(addresses); DB db = client.getDB( "test"); DBCollection coll = db.getCollection( "testdb"); // 插入 BasicDBObject object = new BasicDBObject(); object.append( "test2", "testval2" ); coll.insert(object); DBCursor dbCursor = coll.find(); while (dbCursor.hasNext()) { DBObject dbObject = dbCursor.next(); System. out.println(dbObject.toString()); } } catch (Exception e) { e.printStackTrace(); } } } |
目前看起來支援完美的故障轉移了,這個架構是不是比較完美了?其實還有很多地方可以優化,比如開頭的第二個問題:主節點的讀寫壓力過大如何解決?常見的解決方案是讀寫分離,mongodb副本集的讀寫分離如何做呢?
常規寫操作來說並沒有讀操作多,所以一臺主節點負責寫,兩臺副本節點負責讀。
1、設定讀寫分離需要先在副本節點SECONDARY 設定 setSlaveOk。
2、在程式中設定副本節點負責讀操作,如下程式碼:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
public class TestMongoDBReplSetReadSplit { public static void main(String[] args) { try { List<ServerAddress> addresses = new ArrayList<ServerAddress>(); ServerAddress address1 = new ServerAddress("192.168.1.136" , 27017); ServerAddress address2 = new ServerAddress("192.168.1.137" , 27017); ServerAddress address3 = new ServerAddress("192.168.1.138" , 27017); addresses.add(address1); addresses.add(address2); addresses.add(address3); MongoClient client = new MongoClient(addresses); DB db = client.getDB( "test" ); DBCollection coll = db.getCollection( "testdb" ); BasicDBObject object = new BasicDBObject(); object.append( "test2" , "testval2" ); //讀操作從副本節點讀取 ReadPreference preference = ReadPreference. secondary(); DBObject dbObject = coll.findOne(object, null , preference); System. out .println(dbObject); } catch (Exception e) { e.printStackTrace(); } } } |
讀引數除了secondary一共還有五個引數:primary、primaryPreferred、secondary、secondaryPreferred、nearest。
primary:預設引數,只從主節點上進行讀取操作;
primaryPreferred:大部分從主節點上讀取資料,只有主節點不可用時從secondary節點讀取資料。
secondary:只從secondary節點上進行讀取操作,存在的問題是secondary節點的資料會比primary節點資料“舊”。
secondaryPreferred:優先從secondary節點進行讀取操作,secondary節點不可用時從主節點讀取資料;
nearest:不管是主節點、secondary節點,從網路延遲最低的節點上讀取資料。
好,讀寫分離做好我們可以資料分流,減輕壓力解決了“主節點的讀寫壓力過大如何解決?”這個問題。不過當我們的副本節點增多時,主節點的複製壓力會加大有什麼辦法解決嗎?mongodb早就有了相應的解決方案。
看圖:
其中的仲裁節點不儲存資料,只是負責故障轉移的群體投票,這樣就少了資料複製的壓力。是不是想得很周到啊,一看mongodb的開發兄弟熟知大資料架構體系,其實不只是主節點、副本節點、仲裁節點,還有Secondary-Only、Hidden、Delayed、Non-Voting。
Secondary-Only:不能成為primary節點,只能作為secondary副本節點,防止一些效能不高的節點成為主節點。
Hidden:這類節點是不能夠被客戶端制定IP引用,也不能被設定為主節點,但是可以投票,一般用於備份資料。
Delayed:可以指定一個時間延遲從primary節點同步資料。主要用於備份資料,如果實時同步,誤刪除資料馬上同步到從節點,恢復又恢復不了。
Non-Voting:沒有選舉權的secondary節點,純粹的備份資料節點。
到此整個mongodb副本集搞定了兩個問題:
- 主節點掛了能否自動切換連線?目前需要手工切換。
- 主節點的讀寫壓力過大如何解決?
還有這兩個問題後續解決:
- 從節點每個上面的資料都是對資料庫全量拷貝,從節點壓力會不會過大?
- 資料壓力大到機器支撐不了的時候能否做到自動擴充套件?
做了副本集發現又一些問題:
- 副本集故障轉移,主節點是如何選舉的?能否手動干涉下架某一臺主節點。
- 官方說副本集數量最好是奇數,為什麼?
- mongodb副本集是如何同步的?如果同步不及時會出現什麼情況?會不會出現不一致性?
- mongodb的故障轉移會不會無故自動發生?什麼條件會觸發?頻繁觸發可能會帶來系統負載加重