galera mysql cluster 故障節點再次接入叢集遇到的問題.

babyyellow發表於2017-06-09
galera cluster  是mysql 的多主叢集. 
我們目前搭建了3個節點的測試叢集. 


第一輪測試的時候, 發現一個問題,節點故障了, 下線,然後重新加入叢集,無法加入. 


然後直接整個節點內容 作為一個新節點加入, 也是失敗的. 搞了兩天, 頭大了. 失敗告終. 

報錯資訊如下: 


170609 16:55:59 [Note] WSREP: Read nil XID from storage engines, skipping position init
170609 16:55:59 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-3/libgalera_smm.so'
170609 16:55:59 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <> loaded successfully.
170609 16:55:59 [Note] WSREP: CRC-32C: using hardware acceleration.
170609 16:55:59 [Note] WSREP: Found saved state: 51391c6d-4bff-11e7-a1c3-b797743e8629:824276, safe_to_bootsrap: 0
170609 16:55:59 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.11.152; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 300M; gcache.recover = no; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc
170609 16:55:59 [Note] WSREP: GCache history reset: old(51391c6d-4bff-11e7-a1c3-b797743e8629:0) -> new(51391c6d-4bff-11e7-a1c3-b797743e8629:824276)
170609 16:55:59 [Note] WSREP: Assign initial position for certification: 824276, protocol version: -1
170609 16:55:59 [Note] WSREP: wsrep_sst_grab()
170609 16:55:59 [Note] WSREP: Start replication
170609 16:55:59 [Note] WSREP: Setting initial position to 51391c6d-4bff-11e7-a1c3-b797743e8629:824276
170609 16:55:59 [Note] WSREP: protonet asio version 0
170609 16:55:59 [Note] WSREP: Using CRC-32C for message checksums.
170609 16:55:59 [Note] WSREP: backend: asio
170609 16:55:59 [Note] WSREP: gcomm thread scheduling priority set to other:0 
170609 16:55:59 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
170609 16:55:59 [Note] WSREP: restore pc from disk failed
170609 16:55:59 [Note] WSREP: GMCast version 0
170609 16:55:59 [Warning] WSREP: Failed to resolve tcp:// 192.168.11.98:4567
170609 16:55:59 [Warning] WSREP: Failed to resolve tcp:// 192.168.12.75 :4567
170609 16:55:59 [Note] WSREP: (753e6ee4, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
170609 16:55:59 [Note] WSREP: (753e6ee4, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
170609 16:55:59 [Note] WSREP: EVS version 0
170609 16:55:59 [Note] WSREP: gcomm: connecting to group 'mycluster', peer '192.168.11.152:, 192.168.11.98:, 192.168.12.75 :'
170609 16:55:59 [Note] WSREP: (753e6ee4, 'tcp://0.0.0.0:4567') connection established to 753e6ee4 tcp://192.168.11.152:4567
170609 16:55:59 [Warning] WSREP: (753e6ee4, 'tcp://0.0.0.0:4567') address 'tcp://192.168.11.152:4567' points to own listening address, blacklisting
170609 16:56:02 [Warning] WSREP: no nodes coming from prim view, prim not possible
170609 16:56:02 [Note] WSREP: view(view_id(NON_PRIM,753e6ee4,1) memb {
        753e6ee4,0
} joined {
} left {
} partitioned {
})
170609 16:56:02 [Note] WSREP: (753e6ee4, 'tcp://0.0.0.0:4567') connection to peer 753e6ee4 with addr tcp://192.168.11.152:4567 timed out, no messages seen in PT3S
170609 16:56:03 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50193S), skipping check
170609 16:56:32 [Note] WSREP: view((empty))
170609 16:56:32 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():158
170609 16:56:32 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
170609 16:56:32 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1404: Failed to open channel 'mycluster' at 'gcomm://192.168.11.152, 192.168.11.98, 192.168.12.75 ? gmcast.segment=0 & evs.max_install_timeouts=1': -110 (Connection timed out)
170609 16:56:32 [ERROR] WSREP: gcs connect failed: Connection timed out
170609 16:56:32 [ERROR] WSREP: wsrep::connect(gcomm://192.168.11.152, 192.168.11.98, 192.168.12.75 ? gmcast.segment=0 & evs.max_install_timeouts=1) failed: 7
170609 16:56:32 [ERROR] Aborting

170609 16:56:32 [Note] WSREP: Service disconnected.
170609 16:56:33 [Note] WSREP: Some threads may fail to exit.
170609 16:56:33 [Note] /usr/sbin/mysqld: Shutdown complete



然後 就在也加入不了叢集了. 

人都蒙了, 一度懷疑國內的最大的叢集是怎麼維護這個問題的? 


刪除所有的測試vm , 從新安裝os .從新來過. 


這兩天重新開始測試這個問題.

繼續重複測試這個案例. 

節點刪除後,   重現了相同的問題. 

幾點不管是清空所有資料,重新加入, 還是保留原資料加入叢集. 都是失敗, 報錯資訊跟上面是一樣的. 


又無解了. 

又開始鬱悶了. 按理說不應該. 開始分析報錯資訊. 從資訊上了. 似乎總是讀了第一個節點, 也就是本身這個節點. 
報錯無法連線. 然後重複7次, 然後timeout 退出. 


我們叢集有3個節點, 不應該啊, 第一個無法連線, 應該會roundrobin  嘗試後面的節點連線啊. 
但是從日誌裡,沒有體現出來這個問題. 

我突然開始懷疑這部門軟體程式碼的設計上是不是有問題呢? 

原始碼就不用看了, 我們可以修改下配嘛. 

於是我修改了 wsrep_cluster_address 的配置 把第一個節點的ip 的位置拿到了最後面. 

然後重新啟動資料庫, 奇蹟發生了.


170609 16:57:09 [Note] WSREP: Read nil XID from storage engines, skipping position init
170609 16:57:09 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-3/libgalera_smm.so'
170609 16:57:09 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <> loaded successfully.
170609 16:57:09 [Note] WSREP: CRC-32C: using hardware acceleration.
170609 16:57:09 [Note] WSREP: Found saved state: 51391c6d-4bff-11e7-a1c3-b797743e8629:-1, safe_to_bootsrap: 0
170609 16:57:09 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.11.152; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 300M; gcache.recover = no; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc
170609 16:57:09 [Note] WSREP: GCache history reset: old(51391c6d-4bff-11e7-a1c3-b797743e8629:0) -> new(51391c6d-4bff-11e7-a1c3-b797743e8629:824276)
170609 16:57:09 [Note] WSREP: Assign initial position for certification: 824276, protocol version: -1
170609 16:57:09 [Note] WSREP: wsrep_sst_grab()
170609 16:57:09 [Note] WSREP: Start replication
170609 16:57:09 [Note] WSREP: Setting initial position to 51391c6d-4bff-11e7-a1c3-b797743e8629:824276
170609 16:57:09 [Note] WSREP: protonet asio version 0
170609 16:57:09 [Note] WSREP: Using CRC-32C for message checksums.
170609 16:57:09 [Note] WSREP: backend: asio
170609 16:57:09 [Note] WSREP: gcomm thread scheduling priority set to other:0 
170609 16:57:09 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
170609 16:57:09 [Note] WSREP: restore pc from disk failed
170609 16:57:09 [Note] WSREP: GMCast version 0
170609 16:57:09 [Warning] WSREP: Failed to resolve tcp:// 192.168.12.75:4567
170609 16:57:09 [Note] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
170609 16:57:09 [Note] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
170609 16:57:09 [Note] WSREP: EVS version 0
170609 16:57:09 [Note] WSREP: gcomm: connecting to group 'mycluster', peer '192.168.11.98:, 192.168.12.75:,192.168.11.152 :'
170609 16:57:09 [Note] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') connection established to 9f2dfc7e tcp://192.168.11.152:4567
170609 16:57:09 [Warning] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') address 'tcp://192.168.11.152:4567' points to own listening address, blacklisting
170609 16:57:09 [Note] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') connection established to 017c00ff tcp://192.168.11.98:4567
170609 16:57:09 [Note] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.12.75:4567 
170609 16:57:10 [Note] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') connection established to 325d47d6 tcp://192.168.12.75:4567
170609 16:57:10 [Note] WSREP: declaring 017c00ff at tcp://192.168.11.98:4567 stable
170609 16:57:10 [Note] WSREP: declaring 325d47d6 at tcp://192.168.12.75:4567 stable
170609 16:57:10 [Note] WSREP: Node 017c00ff state prim
170609 16:57:10 [Note] WSREP: view(view_id(PRIM,017c00ff,13) memb {
        017c00ff,0
        325d47d6,0
        9f2dfc7e,0
} joined {
} left {
} partitioned {
})
170609 16:57:10 [Note] WSREP: save pc into disk
170609 16:57:10 [Note] WSREP: gcomm: connected
170609 16:57:10 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
170609 16:57:10 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
170609 16:57:10 [Note] WSREP: Opened channel 'mycluster'
170609 16:57:10 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
170609 16:57:10 [Note] WSREP: Waiting for SST to complete.
170609 16:57:10 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
170609 16:57:10 [Note] WSREP: STATE EXCHANGE: sent state msg: 9f7acc66-4cf1-11e7-878b-a2d0231f71b0
170609 16:57:10 [Note] WSREP: STATE EXCHANGE: got state msg: 9f7acc66-4cf1-11e7-878b-a2d0231f71b0 from 0 (11_98)
170609 16:57:10 [Note] WSREP: STATE EXCHANGE: got state msg: 9f7acc66-4cf1-11e7-878b-a2d0231f71b0 from 1 (12_75)
170609 16:57:10 [Note] WSREP: STATE EXCHANGE: got state msg: 9f7acc66-4cf1-11e7-878b-a2d0231f71b0 from 2 (11_152)
170609 16:57:10 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 12,
        members    = 3/3 (joined/total),
        act_id     = 824276,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 51391c6d-4bff-11e7-a1c3-b797743e8629
170609 16:57:10 [Note] WSREP: Flow-control interval: [28, 28]
170609 16:57:10 [Note] WSREP: Restored state OPEN -> JOINED (824276)
170609 16:57:10 [Note] WSREP: New cluster view: global state: 51391c6d-4bff-11e7-a1c3-b797743e8629:824276, view# 13: Primary, number of nodes: 3, my index: 2, protocol version 3
170609 16:57:10 [Note] WSREP: SST complete, seqno: 824276
170609 16:57:10 [Note] WSREP: Member 2.0 (11_152) synced with group.
170609 16:57:10 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 824276)
170609 16:57:10 [Note] Plugin 'FEDERATED' is disabled.
170609 16:57:10 InnoDB: The InnoDB memory heap is disabled
170609 16:57:10 InnoDB: Mutexes and rw_locks use InnoDB's own implementation
170609 16:57:10 InnoDB: Compressed tables use zlib 1.2.3
170609 16:57:10 InnoDB: Using Linux native AIO
170609 16:57:10 InnoDB: Initializing buffer pool, size = 122.0M
170609 16:57:10 InnoDB: Completed initialization of buffer pool
170609 16:57:10 InnoDB: highest supported file format is Barracuda.
170609 16:57:11  InnoDB: Waiting for the background threads to start
170609 16:57:12 InnoDB: 5.5.54 started; log sequence number 6024720364
170609 16:57:12 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
170609 16:57:12 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
170609 16:57:12 [Note] Server socket created on IP: '0.0.0.0'.
170609 16:57:12 [Note] Event Scheduler: Loaded 0 events
170609 16:57:12 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.54'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL), wsrep_25.19.20170106.aa7e07d
170609 16:57:12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170609 16:57:12 [Note] WSREP: REPL Protocols: 7 (3, 2)
170609 16:57:12 [Note] WSREP: Assign initial position for certification: 824276, protocol version: 3
170609 16:57:12 [Note] WSREP: Service thread queue flushed.
170609 16:57:12 [Note] WSREP: Synchronized with group, ready for connections
170609 16:57:12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170609 16:57:13 [Note] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') connection to peer 9f2dfc7e with addr tcp://192.168.11.152:4567 timed out, no messages seen in PT3S
170609 16:57:13 [Note] WSREP: (9f2dfc7e, 'tcp://0.0.0.0:4567') turning message relay requesting off


節點 順利的連線並加入了叢集.  

然後我又測試了, 把資料檔案都清空的情況, 也是順利的加入了叢集,並自動完成了資料同步,

從另個一個幾點的日誌可以看到 資料同步的情況: 

170608 12:05:48 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
170608 12:05:48 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
"/var/log/mysqld.log" 744L, 62162C                                                                                112,1         13%
170609 16:42:43 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 4,
        members    = 2/3 (joined/total),
        act_id     = 824275,
        last_appl. = 824274,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 51391c6d-4bff-11e7-a1c3-b797743e8629
170609 16:42:43 [Note] WSREP: Flow-control interval: [28, 28]
170609 16:42:43 [Note] WSREP: New cluster view: global state: 51391c6d-4bff-11e7-a1c3-b797743e8629:824275, view# 5: Primary, number of nodes: 3, my index: 0, protocol version 3
170609 16:42:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170609 16:42:43 [Note] WSREP: REPL Protocols: 7 (3, 2)
170609 16:42:43 [Note] WSREP: Assign initial position for certification: 824275, protocol version: 3
170609 16:42:43 [Note] WSREP: Service thread queue flushed.
170609 16:42:43 [Note] WSREP: Member 1.0 (11_152) requested state transfer from '*any*'. Selected 0.0 (11_98)(SYNCED) as donor.
170609 16:42:43 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 824275)
170609 16:42:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170609 16:42:43 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.11.152:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '51391c6d-4bff-11e7-a1c3-b797743e8629:824275''
170609 16:42:43 [Note] WSREP: sst_donor_thread signaled with 0
170609 16:42:43 [Note] WSREP: Flushing tables for SST...
170609 16:42:43 [Note] WSREP: Provider paused at 51391c6d-4bff-11e7-a1c3-b797743e8629:824275 (831018)
170609 16:42:43 [Note] WSREP: Tables flushed.



透過這一點, 也基本上驗證了我的猜測.  

節點在退出叢集后, 從新加入的時候, 如果這個故障節點的ip 在自己的配置檔案 wsrep_cluster_address 的選項中的第一個ip .

那麼這個節點是永遠都無法再加入這個叢集了. 

怎麼辦呢, 把他的ip 從這個配置項裡面,換一下位置. 這個問題就完美解決了. 

透過進一步的測試. 如果這個節點是master , 透過--wsrep-new-cluster 啟動的節點,如果ip 排在第一位會有這個問題. 
如果這個節點 經過上述的步驟能夠重新加入解群了.  那麼這個節點應該就拿不到這個master 的角色了. 

這個時候, 就不會發生上述的問題, 即便ip 排在第一個的位置,也是可以加入叢集的. 


這個應該是一個bug 了. 

再進一步驗證後, 可以提交bug 記錄了. 


結論 規避這個問題的方案 就是 節點的機器上的配置   wsrep-cluster-address  的配置選項裡, 

本機的ip 不要放在第一位. 






來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/133735/viewspace-2140548/,如需轉載,請註明出處,否則將追究法律責任。

相關文章