ZooKeeper如何模擬會話失效(Session Expired)

劉近光發表於2018-08-21

簡介

會話對於ZooKeeper的操作非常重要,當會話由於任何原因結束時,在該會話期間建立的臨時節點會被刪除。在生產環境中,我們需要處理由於網路問題導致的會話超時問題,當網路恢復時,應用能夠自動恢復會話,保證服務的可用性。本文將講解如何模擬會話超時,便於在生產環境中進行應用的測試。

應用場景

會話對於ZooKeeper的操作非常重要。會話中的請求按FIFO順序執行,一旦客戶端連線到伺服器,將建立會話並向客戶端分配會話ID 。客戶端以特定的時間間隔傳送心跳以保持會話有效。如果ZooKeeper集合在超過伺服器開啟時指定的期間(會話超時)都沒有從客戶端接收到心跳,則它會判定客戶端當機。當會話由於任何原因結束時,在該會話期間建立的臨時節點也會被刪除。

為確保網路的健壯性,需要應用能夠自動恢復會話,並重新建立臨時節點。對測試工作來說,需要模擬出會話失效,以對相關功能進行測試。

在下面的場景中,由10.77.16.40:2181,10.77.16.60:2181,10.77.16.67:2181組成一個ZooKeeper的叢集應用,應用部署在10.23.3.85伺服器上,並向ZooKeeper註冊服務。

在服務成功註冊後,可以檢視到相應的節點資訊:

[zk: 10.77.16.40:2181(CONNECTED) 16] ls2 /wg/index_server/vertical_70/shard_0 watch
[search0000000041]
cZxid = 0x206206d5d1c3
ctime = Thu Aug 16 17:00:20 CST 2018
mZxid = 0x206206d5d1c3
mtime = Thu Aug 16 17:00:20 CST 2018
pZxid = 0x2062070de073
cversion = 83
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 1

模擬會話失效

一種比較簡便的模擬會話失效的方式,就是利用本地的防火牆功能,來丟棄相關的網路報文,到達使會話失效的目的。作者使用的是CentOS 7的系統:

[jinguang1@localhost wgis]$ lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.3.1611 (Core)
Release:	7.3.1611
Codename:	Core

在10.23.3.85伺服器上,可以通過iptables來實現丟棄ZooKeeper的互動報文,相應的指令碼如下:

#!/bin/bash

iptables -A OUTPUT -d 10.77.16.40 -p tcp --dport 2181 -j DROP
iptables -A OUTPUT -d 10.77.16.60 -p tcp --dport 2181 -j DROP
iptables -A OUTPUT -d 10.77.16.67 -p tcp --dport 2181 -j DROP
iptables -A INPUT -s 10.77.16.67 -p tcp --sport 2181 -j DROP
iptables -A INPUT -s 10.77.16.60 -p tcp --sport 2181 -j DROP
iptables -A INPUT -s 10.77.16.40 -p tcp --sport 2181 -j DROP

上面的指令碼,將發往ZooKeeper和來自ZooKeeper的報文進行丟棄,來達到使會話失效的目的。在進行相關配置後,可以觀察到ZooKeeper Client相應的日誌:

2018-08-21 16:36:18,961:28937(0x7ffb0affd700):ZOO_ERROR@handle_socket_error_msg@1643: Socket [10.77.16.40:2181] zk retcode=-7, errno=110(Connection timed out): connection to 10.77.16.40:2181 timed out (exceeded timeout by 2ms)
2018-08-21 16:36:22,294:28937(0x7ffb0affd700):ZOO_ERROR@handle_socket_error_msg@1643: Socket [10.77.16.60:2181] zk retcode=-7, errno=110(Connection timed out): connection to 10.77.16.60:2181 timed out (exceeded timeout by 0ms)
2018-08-21 16:36:25,628:28937(0x7ffb0affd700):ZOO_ERROR@handle_socket_error_msg@1643: Socket [10.77.16.67:2181] zk retcode=-7, errno=110(Connection timed out): connection to 10.77.16.67:2181 timed out (exceeded timeout by 0ms)

ZooKeeper上相應的臨時節點被刪除,版本號由83變為84,達到了使會話失效的目的。

[zk: 10.77.16.40:2181(CONNECTED) 23] ls2 /wg/index_server/vertical_70/shard_0
[]
cZxid = 0x206206d5d1c3
ctime = Thu Aug 16 17:00:20 CST 2018
mZxid = 0x206206d5d1c3
mtime = Thu Aug 16 17:00:20 CST 2018
pZxid = 0x2062070df904
cversion = 84
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0

在達到會話失效後,如何恢復網路呢?非常簡單,只要執行下列命令刪除防火牆的應用即可(更精細化的操作是逐條刪除配置的規則):

iptables -F

網路恢復後,ZooKeeper Client重新建立會話:

2018-08-21 17:03:56,671:28937(0x7ffb09ffb700):ZOO_INFO@zookeeper_close@2528: Freeing zookeeper resources for sessionId=0x6068ab7b8d1972

I0821 17:03:57.669409 28946 naming_registry.cc:185] GET FROM ZK zk://10.77.16.40:2181,10.77.16.60:2181,10.77.16.67:2181/weigraph/mutation_proxy/*
I0821 17:03:57.669440 28946 naming_registry.cc:380] ZKClient start to reconnect zookeeper
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.6
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@log_env@716: Client environment:host.name=localhost.localdomain
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@log_env@724: Client environment:os.arch=3.10.0-514.6.2.el7.toa.2.x86_64
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Tue Oct 31 14:54:31 CST 2017
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@log_env@733: Client environment:user.name=jinguang1
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@log_env@753: Client environment:user.dir=/data0/attempt_404_4_2/wgis
2018-08-21 17:03:57,669:28937(0x7ffb1a595700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=10.77.16.40:2181,10.77.16.60:2181,10.77.16.67:2181 sessionTimeout=5000 watcher=0x7a0520 sessionId=0 sessionPasswd=<null> context=0x7ffb0c0008c0 flags=0
[New Thread 0x7ffb09ffb700 (LWP 32334)]
[New Thread 0x7ffb0affd700 (LWP 32335)]
2018-08-21 17:03:57,674:28937(0x7ffb09ffb700):ZOO_INFO@check_events@1705: initiated connection to server [10.77.16.67:2181]
2018-08-21 17:03:57,678:28937(0x7ffb09ffb700):ZOO_INFO@check_events@1752: session establishment complete on server [10.77.16.67:2181], sessionId=0x26068ab52881a66, negotiated timeout=5000

 

相關文章