High Availability手冊(1): 環境

hxw2ljj發表於2015-11-18

三臺KVM虛擬機器

首先我們得有一個pacemaker的環境,需要三臺機器,如果沒有那麼多物理機器,可以用kvm虛擬機器

建立一個bridge

ovs-vsctl add-br ubuntu_br

ifconfig ubuntu_br 192.168.100.1/24

在Host上設定NAT,並且enable ip forwarding

# sysctl -p
net.ipv4.ip_forward = 1

sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

建立虛擬機器映象

我們有一個乾淨的ubuntu的image,基於它建立一個映象

qemu-img create -f qcow2 -o backing_file=./ubuntu-14.04.img ./pacemaker01.img

建立一個xml檔案,將虛擬機器網路卡attach到ubuntu_br上面

# cat pacemaker01.xml 
<domain type='kvm'>
  <name>Pacemaker01</name>
  <uuid>0f0806ab-531d-6134-5def-c5b495529211</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/home/cliu8/images/pacemaker01.img'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='bridge'>
      <mac address='52:54:00:9b:d5:11'/>
      <source bridge='ubuntu_br'/>
      <virtualport type='openvswitch' />
      <model type='virtio'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
    </video>
    <memballoon model='virtio'>
    </memballoon>
  </devices>
</domain>

定義並啟動虛擬機器

virsh define pacemaker01.xml

virsh start Pacemaker01

用VNC登入虛擬機器,進行網路配置

# ps aux | grep Pacemaker01
root     18583  0.0  0.0  10464   928 pts/28   S+   19:16   0:00 grep --color=auto Pacemaker01
libvirt+ 32703  0.6  1.2 6852076 844768 ?      Sl   Jul28   9:05 qemu-system-x86_64 -enable-kvm -name Pacemaker01 -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 0f0806ab-531d-6134-5def-c5b495529211 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/Pacemaker01.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/cliu8/images/pacemaker01.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=36,id=hostnet0,vhost=on,vhostfd=37 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:9b:d5:11,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:7 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

修改主機名

# cat /etc/hostname 
pacemaker01

修改網路卡配置

# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
address 192.168.100.10
netmask 255.255.255.0
broadcast 192.168.100.255
gateway 192.168.100.1

設定DNS

# cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 16.110.135.51
nameserver 16.110.135.52

設定hosts

# cat /etc/hosts
127.0.0.1       localhost
192.168.100.10  pacemaker01
192.168.100.11  pacemaker02
192.168.100.12  pacemaker03

重啟虛擬機器,應該可以進行apt-get update了

同樣配置另外兩臺機器。

安裝配置Pacemaker

首先是安裝

apt-get install pacemaker corosync

配置檔案

root@pacemaker01:/home/openstack# cat /etc/corosync/corosync.conf
# Please read the openais.conf.5 manual page

totem {
        version: 2

        # How long before declaring a token lost (ms)
        token: 3000

        # How many token retransmits before forming a new configuration
        token_retransmits_before_loss_const: 10

        # How long to wait for join messages in the membership protocol (ms)
        join: 60

        # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
        consensus: 3600

        # Turn off the virtual synchrony filter
        vsftype: none

        # Number of messages that may be sent by one processor on receipt of the token
        max_messages: 20

        # Limit generated nodeids to 31-bits (positive signed integers)
        clear_node_high_bit: yes

        # Disable encryption
        secauth: off

        # How many threads to use for encryption/decryption
        threads: 0

        # Optionally assign a fixed node id (integer)
        # nodeid: 1234

        # This specifies the mode of redundant ring, which may be none, active, or passive.
        rrp_mode: active

        interface {
                # The following values need to be set based on your environment 
                ringnumber: 0
                bindnetaddr: 192.168.100.0 
                mcastaddr: 239.255.100.0
                mcastport: 5405
        }
}

amf {
        mode: disabled
}

service {
        ver: 1
        name: pacemaker
}

quorum {
        # Quorum for the Pacemaker Cluster Resource Manager
        provider: corosync_votequorum
        expected_votes: 3
}

aisexec {
        user:   root
        group:  root
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: no
        to_syslog: yes
        syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

啟動corosync

nohup corosync -f > corosync.log 2>&1 &

使用下面的命令檢視corosync是否配置得當

root@pacemaker01:/etc/corosync# corosync-cfgtool -s -a
Printing ring status.
Local node ID 1084777482
RING ID 0
        id      = 192.168.100.10
        status  = ring 0 active with no faults

在corosync.log中有下面的log

Jul 28 19:53:46 notice  [TOTEM ] A new membership (192.168.100.10:8) was formed. Members joined: 1084777482
Jul 28 19:53:46 notice  [QUORUM] Members[1]: 1084777482
Jul 28 19:53:46 notice  [MAIN  ] Completed service synchronization, ready to provide service.
Jul 28 19:55:12 notice  [TOTEM ] A new membership (192.168.100.10:12) was formed. Members joined: 1084777483
Jul 28 19:55:12 notice  [QUORUM] This node is within the primary component and will provide service.
Jul 28 19:55:12 notice  [QUORUM] Members[2]: 1084777482 1084777483
Jul 28 19:55:12 notice  [MAIN  ] Completed service synchronization, ready to provide service.
Jul 28 19:55:15 notice  [TOTEM ] A new membership (192.168.100.10:16) was formed. Members joined: 1084777484
Jul 28 19:55:15 notice  [QUORUM] Members[3]: 1084777482 1084777483 1084777484
Jul 28 19:55:15 notice  [MAIN  ] Completed service synchronization, ready to provide service.

檢視membership

root@pacemaker01:/etc/corosync# corosync-cmapctl 
aisexec.group (str) = root
aisexec.user (str) = root
amf.mode (str) = disabled
internal_configuration.service.0.name (str) = corosync_cmap
internal_configuration.service.0.ver (u32) = 0
internal_configuration.service.1.name (str) = corosync_cfg
internal_configuration.service.1.ver (u32) = 0
internal_configuration.service.2.name (str) = corosync_cpg
internal_configuration.service.2.ver (u32) = 0
internal_configuration.service.3.name (str) = corosync_quorum
internal_configuration.service.3.ver (u32) = 0
internal_configuration.service.4.name (str) = corosync_pload
internal_configuration.service.4.ver (u32) = 0
internal_configuration.service.5.name (str) = corosync_votequorum
internal_configuration.service.5.ver (u32) = 0
internal_configuration.service.7.name (str) = corosync_wd
internal_configuration.service.7.ver (u32) = 0
logging.debug (str) = off
logging.fileline (str) = off
logging.logger_subsys.AMF.debug (str) = off
logging.logger_subsys.AMF.subsys (str) = AMF
logging.logger_subsys.AMF.tags (str) = enter|leave|trace1|trace2|trace3|trace4|trace6
logging.syslog_facility (str) = daemon
logging.timestamp (str) = on
logging.to_logfile (str) = no
logging.to_stderr (str) = yes
logging.to_syslog (str) = yes
quorum.expected_votes (u32) = 3
quorum.provider (str) = corosync_votequorum
resources.watchdog_timeout (u32) = 6
runtime.blackbox.dump_flight_data (str) = no
runtime.blackbox.dump_state (str) = no
runtime.connections.active (u64) = 9
runtime.connections.attrd:3364:0x7fd4e5fa0d10.client_pid (u32) = 3364
runtime.connections.attrd:3364:0x7fd4e5fa0d10.dispatched (u64) = 8
runtime.connections.attrd:3364:0x7fd4e5fa0d10.flow_control (u32) = 0
runtime.connections.attrd:3364:0x7fd4e5fa0d10.flow_control_count (u64) = 0
runtime.connections.attrd:3364:0x7fd4e5fa0d10.invalid_request (u64) = 0
runtime.connections.attrd:3364:0x7fd4e5fa0d10.name (str) = attrd
runtime.connections.attrd:3364:0x7fd4e5fa0d10.overload (u64) = 0
runtime.connections.attrd:3364:0x7fd4e5fa0d10.queue_size (u32) = 0
runtime.connections.attrd:3364:0x7fd4e5fa0d10.recv_retries (u64) = 0
runtime.connections.attrd:3364:0x7fd4e5fa0d10.requests (u64) = 5
runtime.connections.attrd:3364:0x7fd4e5fa0d10.responses (u64) = 2
runtime.connections.attrd:3364:0x7fd4e5fa0d10.send_retries (u64) = 0
runtime.connections.attrd:3364:0x7fd4e5fa0d10.service_id (u32) = 2
runtime.connections.cib:3361:0x7fd4e5fa4320.client_pid (u32) = 3361
runtime.connections.cib:3361:0x7fd4e5fa4320.dispatched (u64) = 37
runtime.connections.cib:3361:0x7fd4e5fa4320.flow_control (u32) = 0
runtime.connections.cib:3361:0x7fd4e5fa4320.flow_control_count (u64) = 0
runtime.connections.cib:3361:0x7fd4e5fa4320.invalid_request (u64) = 0
runtime.connections.cib:3361:0x7fd4e5fa4320.name (str) = cib
runtime.connections.cib:3361:0x7fd4e5fa4320.overload (u64) = 0
runtime.connections.cib:3361:0x7fd4e5fa4320.queue_size (u32) = 0
runtime.connections.cib:3361:0x7fd4e5fa4320.recv_retries (u64) = 0
runtime.connections.cib:3361:0x7fd4e5fa4320.requests (u64) = 29
runtime.connections.cib:3361:0x7fd4e5fa4320.responses (u64) = 2
runtime.connections.cib:3361:0x7fd4e5fa4320.send_retries (u64) = 0
runtime.connections.cib:3361:0x7fd4e5fa4320.service_id (u32) = 2
runtime.connections.closed (u64) = 22
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.client_pid (u32) = 3376
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.dispatched (u64) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.flow_control (u32) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.flow_control_count (u64) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.invalid_request (u64) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.name (str) = corosync-cmapct
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.overload (u64) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.queue_size (u32) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.recv_retries (u64) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.requests (u64) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.responses (u64) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.send_retries (u64) = 0
runtime.connections.corosync-cmapct:3376:0x7fd4e60a9e40.service_id (u32) = 0
runtime.connections.crmd:3366:0x7fd4e5f9f200.client_pid (u32) = 3366
runtime.connections.crmd:3366:0x7fd4e5f9f200.dispatched (u64) = 1
runtime.connections.crmd:3366:0x7fd4e5f9f200.flow_control (u32) = 0
runtime.connections.crmd:3366:0x7fd4e5f9f200.flow_control_count (u64) = 0
runtime.connections.crmd:3366:0x7fd4e5f9f200.invalid_request (u64) = 0
runtime.connections.crmd:3366:0x7fd4e5f9f200.name (str) = crmd
runtime.connections.crmd:3366:0x7fd4e5f9f200.overload (u64) = 0
runtime.connections.crmd:3366:0x7fd4e5f9f200.queue_size (u32) = 0
runtime.connections.crmd:3366:0x7fd4e5f9f200.recv_retries (u64) = 0
runtime.connections.crmd:3366:0x7fd4e5f9f200.requests (u64) = 3
runtime.connections.crmd:3366:0x7fd4e5f9f200.responses (u64) = 3
runtime.connections.crmd:3366:0x7fd4e5f9f200.send_retries (u64) = 0
runtime.connections.crmd:3366:0x7fd4e5f9f200.service_id (u32) = 3
runtime.connections.crmd:3366:0x7fd4e60a5ef0.client_pid (u32) = 3366
runtime.connections.crmd:3366:0x7fd4e60a5ef0.dispatched (u64) = 32
runtime.connections.crmd:3366:0x7fd4e60a5ef0.flow_control (u32) = 0
runtime.connections.crmd:3366:0x7fd4e60a5ef0.flow_control_count (u64) = 0
runtime.connections.crmd:3366:0x7fd4e60a5ef0.invalid_request (u64) = 0
runtime.connections.crmd:3366:0x7fd4e60a5ef0.name (str) = crmd
runtime.connections.crmd:3366:0x7fd4e60a5ef0.overload (u64) = 0
runtime.connections.crmd:3366:0x7fd4e60a5ef0.queue_size (u32) = 0
runtime.connections.crmd:3366:0x7fd4e60a5ef0.recv_retries (u64) = 0
runtime.connections.crmd:3366:0x7fd4e60a5ef0.requests (u64) = 23
runtime.connections.crmd:3366:0x7fd4e60a5ef0.responses (u64) = 2
runtime.connections.crmd:3366:0x7fd4e60a5ef0.send_retries (u64) = 0
runtime.connections.crmd:3366:0x7fd4e60a5ef0.service_id (u32) = 2
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.client_pid (u32) = 3359
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.dispatched (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.flow_control (u32) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.flow_control_count (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.invalid_request (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.name (str) = pacemakerd
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.overload (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.queue_size (u32) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.recv_retries (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.requests (u64) = 1
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.responses (u64) = 1
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.send_retries (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c945f0.service_id (u32) = 1
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.client_pid (u32) = 3359
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.dispatched (u64) = 27
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.flow_control (u32) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.flow_control_count (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.invalid_request (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.name (str) = pacemakerd
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.overload (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.queue_size (u32) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.recv_retries (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.requests (u64) = 11
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.responses (u64) = 2
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.send_retries (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5c9ad90.service_id (u32) = 2
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.client_pid (u32) = 3359
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.dispatched (u64) = 1
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.flow_control (u32) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.flow_control_count (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.invalid_request (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.name (str) = pacemakerd
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.overload (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.queue_size (u32) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.recv_retries (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.requests (u64) = 3
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.responses (u64) = 3
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.send_retries (u64) = 0
runtime.connections.pacemakerd:3359:0x7fd4e5e9cb90.service_id (u32) = 3
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.client_pid (u32) = 3362
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.dispatched (u64) = 15
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.flow_control (u32) = 0
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.flow_control_count (u64) = 0
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.invalid_request (u64) = 0
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.name (str) = stonithd
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.overload (u64) = 0
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.queue_size (u32) = 0
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.recv_retries (u64) = 0
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.requests (u64) = 6
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.responses (u64) = 2
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.send_retries (u64) = 0
runtime.connections.stonithd:3362:0x7fd4e5fa28f0.service_id (u32) = 2
runtime.services.cfg.0.rx (u64) = 0
runtime.services.cfg.0.tx (u64) = 0
runtime.services.cfg.1.rx (u64) = 0
runtime.services.cfg.1.tx (u64) = 0
runtime.services.cfg.2.rx (u64) = 0
runtime.services.cfg.2.tx (u64) = 0
runtime.services.cfg.3.rx (u64) = 0
runtime.services.cfg.3.tx (u64) = 0
runtime.services.cfg.service_id (u16) = 1
runtime.services.cmap.0.rx (u64) = 6
runtime.services.cmap.0.tx (u64) = 3
runtime.services.cmap.service_id (u16) = 0
runtime.services.cpg.0.rx (u64) = 15
runtime.services.cpg.0.tx (u64) = 5
runtime.services.cpg.1.rx (u64) = 0
runtime.services.cpg.1.tx (u64) = 0
runtime.services.cpg.2.rx (u64) = 0
runtime.services.cpg.2.tx (u64) = 0
runtime.services.cpg.3.rx (u64) = 104
runtime.services.cpg.3.tx (u64) = 64
runtime.services.cpg.4.rx (u64) = 0
runtime.services.cpg.4.tx (u64) = 0
runtime.services.cpg.5.rx (u64) = 6
runtime.services.cpg.5.tx (u64) = 3
runtime.services.cpg.service_id (u16) = 2
runtime.services.pload.0.rx (u64) = 0
runtime.services.pload.0.tx (u64) = 0
runtime.services.pload.1.rx (u64) = 0
runtime.services.pload.1.tx (u64) = 0
runtime.services.pload.service_id (u16) = 4
runtime.services.quorum.service_id (u16) = 3
runtime.services.votequorum.0.rx (u64) = 13
runtime.services.votequorum.0.tx (u64) = 6
runtime.services.votequorum.1.rx (u64) = 0
runtime.services.votequorum.1.tx (u64) = 0
runtime.services.votequorum.2.rx (u64) = 0
runtime.services.votequorum.2.tx (u64) = 0
runtime.services.votequorum.3.rx (u64) = 0
runtime.services.votequorum.3.tx (u64) = 0
runtime.services.votequorum.service_id (u16) = 5
runtime.services.wd.service_id (u16) = 7
runtime.totem.pg.mrp.rrp.0.faulty (u8) = 0
runtime.totem.pg.mrp.srp.avg_backlog_calc (u32) = 0
runtime.totem.pg.mrp.srp.avg_token_workload (u32) = 0
runtime.totem.pg.mrp.srp.commit_entered (u64) = 3
runtime.totem.pg.mrp.srp.commit_token_lost (u64) = 0
runtime.totem.pg.mrp.srp.consensus_timeouts (u64) = 0
runtime.totem.pg.mrp.srp.continuous_gather (u32) = 0
runtime.totem.pg.mrp.srp.continuous_sendmsg_failures (u32) = 0
runtime.totem.pg.mrp.srp.firewall_enabled_or_nic_failure (u8) = 0
runtime.totem.pg.mrp.srp.gather_entered (u64) = 3
runtime.totem.pg.mrp.srp.gather_token_lost (u64) = 0
runtime.totem.pg.mrp.srp.mcast_retx (u64) = 0
runtime.totem.pg.mrp.srp.mcast_rx (u64) = 146
runtime.totem.pg.mrp.srp.mcast_tx (u64) = 97
runtime.totem.pg.mrp.srp.memb_commit_token_rx (u64) = 6
runtime.totem.pg.mrp.srp.memb_commit_token_tx (u64) = 6
runtime.totem.pg.mrp.srp.memb_join_rx (u64) = 11
runtime.totem.pg.mrp.srp.memb_join_tx (u64) = 5
runtime.totem.pg.mrp.srp.memb_merge_detect_rx (u64) = 1555
runtime.totem.pg.mrp.srp.memb_merge_detect_tx (u64) = 1555
runtime.totem.pg.mrp.srp.members.1084777482.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1084777482.ip (str) = r(0) ip(192.168.100.10) 
runtime.totem.pg.mrp.srp.members.1084777482.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1084777482.status (str) = joined
runtime.totem.pg.mrp.srp.members.1084777483.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1084777483.ip (str) = r(0) ip(192.168.100.11) 
runtime.totem.pg.mrp.srp.members.1084777483.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1084777483.status (str) = joined
runtime.totem.pg.mrp.srp.members.1084777484.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1084777484.ip (str) = r(0) ip(192.168.100.12) 
runtime.totem.pg.mrp.srp.members.1084777484.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1084777484.status (str) = joined
runtime.totem.pg.mrp.srp.mtt_rx_token (u32) = 241
runtime.totem.pg.mrp.srp.operational_entered (u64) = 3
runtime.totem.pg.mrp.srp.operational_token_lost (u64) = 0
runtime.totem.pg.mrp.srp.orf_token_rx (u64) = 2235
runtime.totem.pg.mrp.srp.orf_token_tx (u64) = 3
runtime.totem.pg.mrp.srp.recovery_entered (u64) = 3
runtime.totem.pg.mrp.srp.recovery_token_lost (u64) = 0
runtime.totem.pg.mrp.srp.rx_msg_dropped (u64) = 0
runtime.totem.pg.mrp.srp.token_hold_cancel_rx (u64) = 17
runtime.totem.pg.mrp.srp.token_hold_cancel_tx (u64) = 7
runtime.totem.pg.msg_queue_avail (u32) = 0
runtime.totem.pg.msg_reserved (u32) = 1
runtime.votequorum.ev_barrier (u32) = 3
runtime.votequorum.lowest_node_id (u32) = 1084777482
runtime.votequorum.this_node_id (u32) = 1084777482
runtime.votequorum.two_node (u8) = 0
service.name (str) = pacemaker
service.ver (str) = 1
totem.clear_node_high_bit (str) = yes
totem.consensus (u32) = 3600
totem.interface.0.bindnetaddr (str) = 192.168.100.0
totem.interface.0.mcastaddr (str) = 239.255.100.0
totem.interface.0.mcastport (u16) = 5405
totem.join (u32) = 60
totem.max_messages (u32) = 20
totem.rrp_mode (str) = active
totem.secauth (str) = off
totem.threads (u32) = 0
totem.token (u32) = 3000
totem.token_retransmits_before_loss_const (u32) = 10
totem.version (u32) = 2
totem.vsftype (str) = none
uidgid.gid.113 (u8) = 1

root@pacemaker01:/etc/corosync# corosync-quorumtool -l

Membership information
----------------------
    Nodeid      Votes Name
1084777482          1 pacemaker01 (local)
1084777483          1 pacemaker02
1084777484          1 pacemaker03

啟動pacemaker

service pacemaker start

檢視所有pacemaker的程式

root@pacemaker01:/etc/corosync# ps aux | grep pace
root      3359  0.0  0.2 107504  4528 pts/2    S    19:58   0:00 pacemakerd
haclust+  3361  0.0  0.5 110168 10796 ?        Ss   19:58   0:00 /usr/lib/pacemaker/cib
root      3362  0.0  0.2 107252  4368 ?        Ss   19:58   0:00 /usr/lib/pacemaker/stonithd
root      3363  0.0  0.1  81812  3476 ?        Ss   19:58   0:00 /usr/lib/pacemaker/lrmd
haclust+  3364  0.0  0.1  97668  3732 ?        Ss   19:58   0:00 /usr/lib/pacemaker/attrd
haclust+  3365  0.0  0.9 110036 18944 ?        Ss   19:58   0:00 /usr/lib/pacemaker/pengine
haclust+  3366  0.0  0.4 166536  9988 ?        Ss   19:58   0:00 /usr/lib/pacemaker/crmd

檢視叢集的狀態

# crm_mon -1
Last updated: Mon Jul 28 20:06:15 2014
Last change: Mon Jul 28 19:58:58 2014 via crmd on pacemaker01
Stack: corosync
Current DC: pacemaker01 (1084777482) - partition with quorum
Version: 1.1.10-42f2063
3 Nodes configured
0 Resources configured

Online: [ pacemaker01 pacemaker02 pacemaker03 ]

crm_mon - Provides a summary of cluster's current state.

-1, --one-shot
              Display the cluster status once on the console and exit

verify configuration

root@pacemaker01:~# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

我們沒有ipmi什麼的,可以控制別的叢集的power,所以disable他

crm configure property stonith-enabled=false

再verify就沒有錯了。

檢視所有的configuration

root@pacemaker01:~# crm configure show xml
<?xml version="1.0" ?>
<cib num_updates="5" dc-uuid="1084777482" update-origin="pacemaker01" crm_feature_set="3.0.7" validate-with="pacemaker-1.2" update-client="crmd" epoch="5" admin_epoch="0" cib-last-written="Mon Jul 28 19:58:58 2014" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-42f2063"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1084777482" uname="pacemaker01"/>
      <node id="1084777483" uname="pacemaker02"/>
      <node id="1084777484" uname="pacemaker03"/>
    </nodes>
    <resources/>
    <constraints/>
  </configuration>
</cib>

新增一個IP地址

配置一個primitive的IP地址

其實resource有很多,是一系列script,用於啟停resource

root@pacemaker01:/usr/lib/ocf/resource.d# tree --charset ASCI
.
|-- heartbeat
|   |-- anything
|   |-- AoEtarget
|   |-- apache
|   |-- asterisk
|   |-- AudibleAlarm
|   |-- ClusterMon
|   |-- conntrackd
|   |-- CTDB
|   |-- db2
|   |-- Delay
|   |-- dhcpd
|   |-- drbd
|   |-- Dummy
|   |-- eDir88
|   |-- ethmonitor
|   |-- Evmsd
|   |-- EvmsSCC
|   |-- exportfs
|   |-- Filesystem
|   |-- fio
|   |-- ICP
|   |-- ids
|   |-- IPaddr
|   |-- IPaddr2
|   |-- IPsrcaddr
|   |-- IPv6addr
|   |-- iscsi
|   |-- iSCSILogicalUnit
|   |-- iSCSITarget
|   |-- jboss
|   |-- ldirectord
|   |-- LinuxSCSI
|   |-- LVM
|   |-- lxc
|   |-- MailTo
|   |-- ManageRAID
|   |-- ManageVE
|   |-- mysql
|   |-- mysql-proxy
|   |-- named
|   |-- nfsserver
|   |-- nginx
|   |-- oracle
|   |-- oralsnr
|   |-- pgsql
|   |-- pingd
|   |-- portblock
|   |-- postfix
|   |-- pound
|   |-- proftpd
|   |-- Pure-FTPd
|   |-- Raid1
|   |-- Route
|   |-- rsyncd
|   |-- rsyslog
|   |-- SAPDatabase
|   |-- SAPInstance
|   |-- scsi2reservation
|   |-- SendArp
|   |-- ServeRAID
|   |-- sfex
|   |-- slapd
|   |-- SphinxSearchDaemon
|   |-- Squid
|   |-- Stateful
|   |-- symlink
|   |-- SysInfo
|   |-- syslog-ng
|   |-- tomcat
|   |-- varnish
|   |-- VIPArip
|   |-- VirtualDomain
|   |-- vmware
|   |-- WAS
|   |-- WAS6
|   |-- WinPopup
|   |-- Xen
|   `-- Xinetd
|-- pacemaker
|   |-- ClusterMon
|   |-- controld
|   |-- Dummy
|   |-- HealthCPU
|   |-- HealthSMART
|   |-- o2cb
|   |-- ping
|   |-- pingd
|   |-- Stateful
|   |-- SysInfo
|   `-- SystemHealth
`-- redhat -> ../../../share/cluster

root@pacemaker01:/usr/share/cluster# tree --charset ASCI      
.
|-- apache.metadata
|-- apache.sh
|-- ASEHAagent.sh
|-- clusterfs.sh
|-- fs.sh
|-- ip.sh
|-- lvm_by_lv.sh
|-- lvm_by_vg.sh
|-- lvm.metadata
|-- lvm.sh
|-- mysql.metadata
|-- mysql.sh
|-- named.metadata
|-- named.sh
|-- netfs.sh
|-- nfsclient.sh
|-- nfsexport.sh
|-- nfsserver.sh
|-- ocf-shellfuncs
|-- openldap.metadata
|-- openldap.sh
|-- oracledb.sh
|-- orainstance.metadata
|-- orainstance.sh
|-- oralistener.metadata
|-- oralistener.sh
|-- postgres-8.metadata
|-- postgres-8.sh
|-- relaxng
|   |-- ra2man.xsl
|   |-- ra2ref.xsl
|   |-- ra2rng.xsl
|   |-- ra-api-1-modified.dtd
|   |-- resources.rng.head
|   |-- resources.rng.mid
|   `-- resources.rng.tail
|-- samba.metadata
|-- samba.sh
|-- SAPDatabase
|-- SAPInstance
|-- script.sh
|-- service.sh
|-- smb.sh
|-- svclib_nfslock
|-- tomcat-5.metadata
|-- tomcat-5.sh
|-- tomcat-6.metadata
|-- tomcat-6.sh
|-- utils
|   |-- config-utils.sh
|   |-- fs-lib.sh
|   |-- httpd-parse-config.pl
|   |-- member_util.sh
|   |-- messages.sh
|   |-- named-parse-config.pl
|   |-- ra-skelet.sh
|   `-- tomcat-parse-config.pl
`-- vm.sh

從/usr/lib/ocf/resource.d/heartbeat/IPaddr2指令碼我們可以看出:

start操作呼叫ip_start

ip_start呼叫add_interface $OCF_RESKEY_ip $NETMASK $BRDCAST $NIC $IFLABEL

add_interface中執行下面的命令

$IP2UTIL -f inet addr add $ipaddr/$netmask brd $broadcast dev $iface

$IP2UTIL link set $iface up

我們也可以從命令列檢視所有的resource agent

root@pacemaker01:~# crm ra classes
lsb
ocf / heartbeat pacemaker redhat
service
stonith
upstart

root@pacemaker01:~# crm ra list ocf heartbeat  
AoEtarget           AudibleAlarm        CTDB                ClusterMon          Delay               Dummy
EvmsSCC             Evmsd               Filesystem          ICP                 IPaddr              IPaddr2
IPsrcaddr           IPv6addr            LVM                 LinuxSCSI           MailTo              ManageRAID
ManageVE            Pure-FTPd           Raid1               Route               SAPDatabase         SAPInstance
SendArp             ServeRAID           SphinxSearchDaemon  Squid               Stateful            SysInfo
VIPArip             VirtualDomain       WAS                 WAS6                WinPopup            Xen
Xinetd              anything            apache              asterisk            conntrackd          db2
dhcpd               drbd                eDir88              ethmonitor          exportfs            fio
iSCSILogicalUnit    iSCSITarget         ids                 iscsi               jboss               ldirectord
lxc                 mysql               mysql-proxy         named               nfsserver           nginx
oracle              oralsnr             pgsql               pingd               portblock           postfix
pound               proftpd             rsyncd              rsyslog             scsi2reservation    sfex
slapd               symlink             syslog-ng           tomcat              varnish             vmware

也可以檢視某個resouce agent如何配置

root@pacemaker01:~# crm ra info ocf:heartbeat:IPaddr2 
Manages virtual IPv4 addresses (Linux specific version) (ocf:heartbeat:IPaddr2)
Manages virtual IPv4 addresses (Linux specific version) (ocf:heartbeat:IPaddr2)

This Linux-specific resource manages IP alias IP addresses.
It can add an IP alias, or remove one.
In addition, it can implement Cluster Alias IP functionality
if invoked as a clone resource.

If used as a clone, you should explicitly set clone-node-max >= 2,
and/or clone-max < number of nodes. In case of node failure,
clone instances need to be re-allocated on surviving nodes.
Which would not be possible, if there is already an instance on those nodes,
and clone-node-max=1 (which is the default).

Parameters (* denotes required, [] the default):

ip* (string): IPv4 address
    The IPv4 address to be configured in dotted quad notation, for example
    "192.168.1.1".

nic (string): Network interface
    The base network interface on which the IP address will be brought
    online. 
    If left empty, the script will try and determine this from the
    routing table.
    Do NOT specify an alias interface in the form eth0:1 or anything here;
    rather, specify the base interface only.
    If you want a label, see the iflabel parameter.
    Prerequisite:
    There must be at least one static IP address, which is not managed by
    the cluster, assigned to the network interface.
    If you can not assign any static IP address on the interface,
    modify this kernel parameter:
    sysctl -w net.ipv4.conf.all.promote_secondaries=1 # (or per device)

cidr_netmask (string): CIDR netmask
    The netmask for the interface in CIDR format
    (e.g., 24 and not 255.255.255.0)
    If unspecified, the script will also try to determine this from the
    routing table.

broadcast (string): Broadcast address
    Broadcast address associated with the IP. If left empty, the script will
    determine this from the netmask.

iflabel (string): Interface label
    You can specify an additional label for your IP address here.
    This label is appended to your interface name.
    If a label is specified in nic name, this parameter has no effect.

lvs_support (boolean, [false]): Enable support for LVS DR
    Enable support for LVS Direct Routing configurations. In case a IP
    address is stopped, only move it to the loopback device to allow the
    local node to continue to service requests, but no longer advertise it
    on the network.

mac (string): Cluster IP MAC address
    Set the interface MAC address explicitly. Currently only used in case of
    the Cluster IP Alias. Leave empty to chose automatically.

clusterip_hash (string, [sourceip-sourceport]): Cluster IP hashing function
    Specify the hashing algorithm used for the Cluster IP functionality.

unique_clone_address (boolean, [false]): Create a unique address for cloned instances
    If true, add the clone ID to the supplied value of ip to create
    a unique address to manage

arp_interval (integer, [200]): ARP packet interval in ms
    Specify the interval between unsolicited ARP packets in milliseconds.

arp_count (integer, [5]): ARP packet count
    Number of unsolicited ARP packets to send.

arp_bg (string, [true]): ARP from background
    Whether or not to send the arp packets in the background.

arp_mac (string, [ffffffffffff]): ARP MAC
    MAC address to send the ARP packets to.
    You really shouldn't be touching this.

flush_routes (boolean, [false]): Flush kernel routing table on stop
    Flush the routing table on stop. This is for
    applications which use the cluster IP address
    and which run on the same physical host that the
    IP address lives on. The Linux kernel may force that
    application to take a shortcut to the local loopback
    interface, instead of the interface the address
    is really bound to. Under those circumstances, an
    application may, somewhat unexpectedly, continue
    to use connections for some time even after the
    IP address is deconfigured. Set this parameter in
    order to immediately disable said shortcut when the
    IP address goes away.

Operations' defaults (advisory minimum):

    start         timeout=20s
    stop          timeout=20s
    status        timeout=20s interval=10s
    monitor       timeout=20s interval=10s

根據這個manual,我們新增IP Address

crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 params ip=192.168.100.100 cidr_netmask=24 op monitor interval=30s

我們檢視現在的狀態

root@pacemaker01:~# crm_mon -1
Last updated: Mon Jul 28 20:22:46 2014
Last change: Mon Jul 28 20:22:21 2014 via cibadmin on pacemaker01
Stack: corosync
Current DC: pacemaker01 (1084777482) - partition with quorum
Version: 1.1.10-42f2063
3 Nodes configured
1 Resources configured

Online: [ pacemaker01 pacemaker02 pacemaker03 ]

ClusterIP      (ocf::heartbeat:IPaddr2):       Started pacemaker01

 

發現IP放在了pacemaker01這臺機器上

root@pacemaker01:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:9b:d5:11 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.10/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.100.100/24 brd 192.168.100.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe9b:d511/64 scope link 
       valid_lft forever preferred_lft forever

如果pacemaker01掛掉,則IP會到另外一臺機器上,如果pacemaker01重啟,是否IP會回來呢?

我們其實不想讓他回來,於是設定下面的值,使得每次移動有成本

crm configure rsc_defaults resource-stickiness=100

配置Apache

我們來看/usr/lib/ocf/resource.d/heartbeat/apache

會發現apache_start會呼叫ocf_run $HTTPD $HTTPDOPTS $OPTIONS -f $CONFIGFILE

crm ra info ocf:heartbeat:apache

首先每個Node上都要安裝apache2

apt-get install apache2 wget

每個機器上的網站內容不同,從而可以區分哪個網站

root@pacemaker01:~# vim /var/www/html/index.html

<html>
<body>My Test Site - pacemaker01</body>
</html>

crm configure primitive WebSite ocf:heartbeat:apache params configfile=/etc/apache2/apache2.conf statusurl=" op monitor interval=1min

crm configure op_defaults timeout=240s

我們想讓IP地址和apache的在同一個機器,才有意義

crm configure colocation website-with-ip INFINITY: WebSite ClusterIP

當cluster有變化重新配置的時候,先變IP,再變apache

crm configure order apache-after-ip mandatory: ClusterIP WebSite

最終配置如下

root@pacemaker01:~# crm configure show xml
<?xml version="1.0" ?>
<cib num_updates="1" dc-uuid="1084777482" update-origin="pacemaker01" crm_feature_set="3.0.7" validate-with="pacemaker-1.2" update-client="cibadmin" epoch="19" admin_epoch="0" cib-last-written="Mon Jul 28 21:03:44 2014" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-42f2063"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1084777482" uname="pacemaker01"/>
      <node id="1084777483" uname="pacemaker02"/>
      <node id="1084777484" uname="pacemaker03"/>
    </nodes>
    <resources>
      <primitive id="WebSite" class="ocf" provider="heartbeat" type="apache">
        <instance_attributes id="WebSite-instance_attributes">
          <nvpair name="configfile" value="/etc/apache2/apache2.conf" id="WebSite-instance_attributes-configfile"/>
          <nvpair name="statusurl" value=" id="WebSite-instance_attributes-statusurl"/>
        </instance_attributes>
        <operations>
          <op name="monitor" interval="1min" id="WebSite-monitor-1min"/>
        </operations>
      </primitive>
      <primitive id="ClusterIP" class="ocf" provider="heartbeat" type="IPaddr2">
        <instance_attributes id="ClusterIP-instance_attributes">
          <nvpair name="ip" value="192.168.100.100" id="ClusterIP-instance_attributes-ip"/>
          <nvpair name="cidr_netmask" value="24" id="ClusterIP-instance_attributes-cidr_netmask"/>
        </instance_attributes>
        <operations>
          <op name="monitor" interval="30s" id="ClusterIP-monitor-30s"/>
        </operations>
      </primitive>
    </resources>
    <constraints>
      <rsc_order id="apache-after-ip" score="INFINITY" first="ClusterIP" then="WebSite"/>
      <rsc_colocation id="website-with-ip" score="INFINITY" rsc="WebSite" with-rsc="ClusterIP"/>
    </constraints>
    <op_defaults>
      <meta_attributes id="op-options">
        <nvpair name="timeout" value="240s" id="op-options-timeout"/>
      </meta_attributes>
    </op_defaults>
    <rsc_defaults>
      <meta_attributes id="rsc-options">
        <nvpair name="resource-stickiness" value="100" id="rsc-options-resource-stickiness"/>
      </meta_attributes>
    </rsc_defaults>
  </configuration>
</cib>

我們來驗證

# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker01</body>
</html>

試圖從一個機器移到另一個機器

root@pacemaker01:~# crm resource move WebSite pacemaker02
root@pacemaker01:~# crm_mon -1
Last updated: Tue Jul 29 10:07:51 2014
Last change: Tue Jul 29 10:07:44 2014 via crm_resource on pacemaker01
Stack: corosync
Current DC: pacemaker01 (1084777482) - partition with quorum
Version: 1.1.10-42f2063
3 Nodes configured
2 Resources configured

Online: [ pacemaker01 pacemaker02 pacemaker03 ]

ClusterIP      (ocf::heartbeat:IPaddr2):       Started pacemaker02 
WebSite        (ocf::heartbeat:apache):        Started pacemaker02

root@pacemaker01:~# curl http://192.168.100.100
<html>
<body>My Test Site - pacemaker02</body>
</html>

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/18796236/viewspace-1840128/,如需轉載,請註明出處,否則將追究法律責任。

相關文章