DRBD+Pacemaker實現DRBD主從角色的自動切換
前提:
drbd裝置名,以及drbd裝置的掛載點都要與對端節點保持一致;
因為我們
定義資源,使用到裝置名及掛載點,所以兩端的drbd裝置名和裝置的掛載點都必須保持一致;
如何定義主從資源?
主從資源是一類特殊的克隆資源;
要成為克隆資源,首先必須定義成主資源;
因此,要想定義成主從資源,首先必須定義成主資源。為保證成為主資源的同時,drdb裝置可以同時掛載,還需定義Filesystem
clone-max: 在叢集中最多能執行多少份克隆資源,預設和叢集中的節點數相同;
clone-node-max:每個節點上最多能執行多少份克隆資源,預設是1;
notify:當成功啟動或關閉一份克隆資源,要不要通知給其它的克隆資源,可用值為false,true;預設值是true;
globally-unique: 是否為叢集中各節點的克隆資源取一個全域性唯一名稱,用來描述不同的功能,預設為true;
ordered:克隆資源是否按順序(order)啟動,而非一起(parallel)啟動,可用值為false,true;預設值是true;
interleave:當對端的例項有了interleave,就可以改變克隆資源或主資源中的順序約束;
master-max:最多有多少份克隆資源可以被定義成主資源,預設是1;
master-node-max:每個節點上最多有多少份克隆資源可以被提升為主資源,預設是1;
檢查node1,node2是否安裝了corosync,pacemaker,crmsh,pssh
[root@node1 ~]# rpm -q corosync pacemaker crmsh pssh
corosync-1.4.1-17.el6.x86_64
pacemaker-1.1.10-14.el6.x86_64
crmsh-1.2.6-4.el6.x86_64
pssh-2.3.1-2.el6.x86_64
[root@node1 ~]# ssh node2.ja.com `rpm -q corosync pacemaker crmsh pssh`
corosync-1.4.1-17.el6.x86_64
pacemaker-1.1.10-14.el6.x86_64
crmsh-1.2.6-4.el6.x86_64
pssh-2.3.1-2.el6.x86_64
若未安裝,則執行yum -y install corosync pacemaker crmsh pssh
一旦配置成叢集資源,就不能讓他們自動啟動
[root@node1 ~]# umount /drbd/
[root@node1 ~]# drbdadm secondary mystore1
[root@node1 ~]# service drbd stop
[root@node1 ~]# chkconfig drbd off
[root@node1 ~]# ssh node2.ja.com `service drbd stop;chkconfig drbd off`
[root@node1 ~]# cd /etc/corosync/
[root@node1 corosync]# cp corosync.conf.example corosync.conf
corosync.conf修改後的內容如下所示:
[root@node1 corosync]# egrep -v `^$|^[[:space:]]*#` /etc/corosync/corosync.conf
compatibility: whitetank
totem {
version: 2
secauth: on
threads: 0
interface {
ringnumber: 0
bindnetaddr: 172.16.16.0
mcastaddr: 226.94.16.15
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: no
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service {
name: pacemaker
ver:  0
}
aisexec {
user:  root
group: root
}
使用corosync-keygen生成認證檔案的時候,由於熵池的隨機數不夠用,可能需要等待較長時間,在下面,我們就採取一種簡便易行的方法,在生產環境儘量不要這麼用,因為不安全。
[root@node1 corosync]# mv /dev/random /dev/h
[root@node1 corosync]# ln /dev/urandom /dev/random
[root@node1 corosync]# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /etc/corosync/authkey.
[root@node1 corosync]# rm -rf /dev/random
[root@node1 corosync]# mv /dev/h /dev/random
[root@node1 corosync]# ll authkey corosync.conf
-r——– 1 root root 128 Apr 28 17:23 authkey
-rw-r–r– 1 root root 708 Apr 28 13:51 corosync.conf
[root@node1 corosync]# scp -p authkey corosync.conf node2.ja.com:/etc/corosync/
驗證對端認證檔案和主配置檔案的許可權,是否保持不變
[root@node1 corosync]# ssh node2.ja.com `ls -l /etc/corosync/{authkey,corosync.conf}`
-r——– 1 root root 128 Apr 28 17:23 /etc/corosync/authkey
-rw-r–r– 1 root root 708 Apr 28 13:51 /etc/corosync/corosync.conf
啟動corosync服務
[root@node1 corosync]# service corosync start
[root@node1 corosync]# ssh node2.ja.com `service corosync start`
現在就一切正常了
[root@node1 corosync]# crm status
Last updated: Mon Apr 28 18:20:41 2014
Last change: Mon Apr 28 18:16:01 2014 via crmd on node1.ja.com
Stack: classic openais (with plugin)
Current DC: node2.ja.com – partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node1.ja.com node2.ja.com ]
[root@node2 drbd.d]# crm status
Last updated: Mon Apr 28 06:19:36 2014
Last change: Mon Apr 28 18:16:01 2014 via crmd on node1.ja.com
Stack: classic openais (with plugin)
Current DC: node2.ja.com – partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node1.ja.com node2.ja.com ]
[root@node1 ~]# crm
crm(live)# configure
crm(live)configure# property stonith-enable=false
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# rsc_defaults resource-stickiness=100
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
crm(live)resource# cd
crm(live)# exit
bye
[root@node1 ~]# crm
crm(live)# ra
crm(live)ra# classes
lsb
ocf / heartbeat linbit pacemaker
service
stonith
crm(live)ra# list ocf heartbeat
CTDB           Dummy          Filesystem     IPaddr         IPaddr2        IPsrcaddr
LVM            MailTo         Route          SendArp        Squid          VirtualDomain
Xinetd         apache         conntrackd     dhcpd          ethmonitor     exportfs
mysql          mysql-proxy    named          nfsserver      nginx          pgsql
postfix        rsyncd         rsyslog        slapd          symlink        tomcat
crm(live)ra# list ocf pacemaker
ClusterMon     Dummy          HealthCPU      HealthSMART    Stateful       SysInfo
SystemHealth   controld       ping           pingd          remote        
crm(live)ra# list ocf linbit
drbd      
crm(live)ra# meta ocf:linbit:drbd
crm(live)ra# cd
crm(live)# configure
crm(live)configure# primitive mysqlstore2 ocf:linbit:drbd params drbd_resource=mystore1 op monitor role=Master intrval=30s timeout=20s op mointor role=Slave interval=60s timeout=20s op start timeout=240s op stop timeout=100s
crm(live)configure# verify
crm(live)configure# master ms_mysqlstore1 mysqlstore meta master-max=1 master-node-max=1 clone-max=2  clone-node-max=1 notify=”True”
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
crm(live)configure# cd
crm(live)# node standby node1.ja.com
此時發現node2自動提升為主的
crm(live)# status
讓node1再上線,發現node1,是從的;node2還是主的
crm(live)# node  online node1.ja.com
為主節點定義檔案系統資源
# crm
crm(live)# configure
crm(live)configure# primitive WebFS ocf:heartbeat:Filesystem params device=”/dev/drbd0″ directory=”/www” fstype=”ext3″
crm(live)configure# colocation WebFS_on_MS_webdrbd inf: WebFS MS_Webdrbd:Master
crm(live)configure# order WebFS_after_MS_Webdrbd inf: MS_Webdrbd:promote WebFS:start
crm(live)configure# verify
crm(live)configure# commit
檢視叢集中資源的執行狀態:
crm status
============
Last updated: Fri Jun 17 06:26:03 2011
Stack: openais
Current DC: node2.a.org – partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node2.a.org node1.a.org ]
Master/Slave Set: MS_Webdrbd
Masters: [ node2.a.org ]
Slaves: [ node1.a.org ]
WebFS (ocf::heartbeat:Filesystem): Started node2.a.org
由上面的資訊可以發現,此時WebFS執行的節點和drbd服務的Primary節點均為node2.a.org;我們在node2上覆制一些檔案至/www目錄(掛載點),而後在故障故障轉移後檢視node1的/www目錄下是否存在這些檔案。
# cp /etc/rc./rc.sysinit /www
下面我們模擬node2節點故障,看此些資源可否正確轉移至node1。
以下命令在Node2上執行:
# crm node standby
# crm status
============
Last updated: Fri Jun 17 06:27:03 2011
Stack: openais
Current DC: node2.a.org – partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Node node2.a.org: standby
Online: [ node1.a.org ]
Master/Slave Set: MS_Webdrbd
Masters: [ node1.a.org ]
Stopped: [ webdrbd:0 ]
WebFS (ocf::heartbeat:Filesystem): Started node1.a.org
由上面的資訊可以推斷出,node2已經轉入standby模式,其drbd服務已經停止,但故障轉移已經完成,所有資源已經正常轉移至node1。
在node1可以看到在node2作為primary節點時產生的儲存至/www目錄中的資料,在node1上均存在一份拷貝。
讓node2重新上線:
# crm node online
[root@node2 ~]# crm status
============
Last updated: Fri Jun 17 06:30:05 2011
Stack: openais
Current DC: node2.a.org – partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node2.a.org node1.a.org ]
Master/Slave Set: MS_Webdrbd
Masters: [ node1.a.org ]
Slaves: [ node2.a.org ]
WebFS (ocf::heartbeat:Filesystem): Started node1.a.org