zookeeper叢集崩潰處理

Federico發表於2017-07-12

今天在私有化專案中遇到如下問題:

1.客戶反饋使用者登入返回303

2.登入伺服器檢視是大量的log將伺服器磁碟空間佔用殆盡,導致所有服務程式仍舊存在但是監聽埠失敗,服務不可用

3.清理日誌檔案

4.日誌檔案清理完成後,重啟服務,重啟zookeeper服務時出現以下報錯

2017-07-12 10:52:39,171 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /data/apps/config/zookeeper/zoo.cfg
2017-07-12 10:52:39,176 [myid:] - INFO [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2017-07-12 10:52:39,180 [myid:2] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 5
2017-07-12 10:52:39,180 [myid:2] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 24
2017-07-12 10:52:39,183 [myid:2] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2017-07-12 10:52:39,194 [myid:2] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
2017-07-12 10:52:39,196 [myid:2] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2017-07-12 10:52:39,206 [myid:2] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2017-07-12 10:52:39,218 [myid:2] - INFO [main:QuorumPeer@959] - tickTime set to 2000
2017-07-12 10:52:39,218 [myid:2] - INFO [main:QuorumPeer@979] - minSessionTimeout set to -1
2017-07-12 10:52:39,218 [myid:2] - INFO [main:QuorumPeer@990] - maxSessionTimeout set to -1
2017-07-12 10:52:39,218 [myid:2] - INFO [main:QuorumPeer@1005] - initLimit set to 10
2017-07-12 10:52:39,230 [myid:2] - INFO [main:FileSnap@83] - Reading snapshot /data/apps/data/zookeeper/version-2/snapshot.60000888d
2017-07-12 10:52:39,341 [myid:2] - ERROR [main:Util@239] - Last transaction was partial.
2017-07-12 10:52:39,342 [myid:2] - ERROR [main:QuorumPeer@497] - Unable to load database on disk
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:547)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:522)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:354)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:450)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:440)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2017-07-12 10:52:39,345 [myid:2] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server

經查閱資料得知,造成zookeeper崩潰的原因是

zookeeper呈現給使用某些狀態的所有客戶端程式一致性的狀態檢視。當一個客戶端從zookeeper獲得響應時,客戶端可以非常肯定這個響應資訊與其他響應資訊或其他客戶端所接收的響應均保持一致。有時,zookeeper客戶端庫與zookeeper服務的連線會丟失,而且服務提供一致性保證資訊,當客戶端發現自己處於這種狀態時就會返回這種狀態。

 

解決方法:

1.檢視zookeeper的配置檔案,找到資料的存放目錄

cat /etc/zookeeper/conf/zoo.cfg

2.刪除或重新命名資料配置檔案

cd /var/lib/zookeeper

mv ./version-2 ./version-2.bak

3.重新啟動zookeeper,檢視程式以及埠號是否被監聽。

相關文章