Cassandra Gossip協議的二三事兒
Gossip協議是Cassandra維護各節點狀態的一個重要元件,下面我們以Gossip協議三次握手為線索逐步分析Gossip協議原始碼。
Gossip協議透過判斷節點的generation和version 來確認節點狀態資訊新舊,如果節點重啟,則generation加一,version每次從零開始計算。所以 generation是大版本號,version為小版本號,理解這個概念對後面的握手邏輯有很大幫助。
Gossip協議最重要的一個屬性是endpointStateMap ,這個map以address為key,以EndpointState為value維護節點自身狀態資訊。EndopointState 包含了節點 net_version,host_id,rpc_address,release_version,dc,rack,load,status,tokens 資訊。總體來說,所有節點維護的endpointStateMap應該是一致的,如果出現不一致資訊或者新增,替換,刪除節點 ,這中間的狀態維護就要靠Gossip來實現了。
另外一個重要屬性subscribers ,當節點狀態變更時候,gossip 會通知各個subscribers。
Gossip啟動時候,會每隔一秒會在叢集中隨機選擇一個節點傳送一條GossipDigestSyn訊息,開始和其他節點的通訊,如下圖:
https://bbs-img.huaweicloud.com/blogs/img/1590807559705043555.png
接下來我們根據上面的了流程圖一步步分析gossip程式碼,GossipDigestSyn 訊息是在GossipTask構造的,
1 // syn訊息包含 叢集名字,分割槽器,和gDigests訊息
2 GossipDigestSyn digestSynMessage = new GossipDigestSyn(DatabaseDescriptor.getClusterName(),DatabaseDescriptor.getPartitionerName(),gDigests);
3 MessageOut<GossipDigestSyn> message = new MessageOut<GossipDigestSyn>(MessagingService.Verb.GOSSIP_DIGEST_SYN,digestSynMessage,GossipDigestSyn.serializer);
GossipDigestSyn 訊息的主要部分在gDigests裡面,gDigests是透過方法Gossiper.instance.makeRandomGossipDigest(gDigests) 生成的,
private void makeRandomGossipDigest(List<GossipDigest> gDigests)
02 {
03 EndpointState epState;
04 int generation = 0 ;
05 int maxVersion = 0 ;
06
07 // local epstate will be part of endpointStateMap
08 //當前節點維護的節點列表
09 List<InetAddress> endpoints = new ArrayList<InetAddress>(endpointStateMap.keySet());
10 //亂序處理
11 Collections.shuffle(endpoints, random);
12 for (InetAddress endpoint : endpoints)
13 {
14 epState = endpointStateMap.get(endpoint);
15 if (epState != null )
16 {
17 //獲取generation版本號
18 generation = epState.getHeartBeatState().getGeneration();
19 //EndpointState包含了tokens,hostid,status,load等資訊,所以氣泡排序獲取其中最大的maxVersion
20 maxVersion = getMaxEndpointStateVersion(epState);
21 }
22 gDigests.add( new GossipDigest(endpoint, generation, maxVersion));
23 }
24
25 if (logger.isTraceEnabled())
26 {
27 StringBuilder sb = new StringBuilder();
28 for (GossipDigest gDigest : gDigests)
29 {
30 sb.append(gDigest);
31 sb.append( " " );
32 }
33 logger.trace( "Gossip Digests are : {}" , sb);
34 }
35 }
A節點發出GossipDigestSyn後,B節點會透過GossipDigestSynVerbHandler 來處理GossipDigestSyn 訊息,具體處理邏輯在Gossiper.instance.examineGossiper中,
01 void examineGossiper(List<GossipDigest> gDigestList, List<GossipDigest> deltaGossipDigestList, Map<InetAddress, EndpointState> deltaEpStateMap)
02 {
03
04 for ( GossipDigest gDigest : gDigestList )
05 {
06 int remoteGeneration = gDigest.getGeneration();
07 int maxRemoteVersion = gDigest.getMaxVersion();
08 /* Get state associated with the end point in digest */
09 EndpointState epStatePtr = endpointStateMap.get(gDigest.getEndpoint());
10 /*
11 Here we need to fire a GossipDigestAckMessage. If we have some data associated with this endpoint locally
12 then we follow the "if" path of the logic. If we have absolutely nothing for this endpoint we need to
13 request all the data for this endpoint.
14 */
15 if (epStatePtr != null)
16 {
17 int localGeneration = epStatePtr.getHeartBeatState().getGeneration();
18 /* get the max version of all keys in the state associated with this endpoint */
19 int maxLocalVersion = getMaxEndpointStateVersion(epStatePtr);
20 if (remoteGeneration == localGeneration && maxRemoteVersion == maxLocalVersion)
21 //如果generation和version版本號都一致,說明本地節點和遠端節點版本號都一致,則跳過下面邏輯
22 continue;
23
24 if (remoteGeneration > localGeneration)
25 {
26 /* we request everything from the gossiper */
27 //如果遠端節點generation版本大於本地,則向遠端節點請求所有該endpoint資訊
28 requestAll(gDigest, deltaGossipDigestList, remoteGeneration);
29 }
30 else if (remoteGeneration < localGeneration)
31 {
32 /* send all data with generation = localgeneration and version > 0 */
33 //如果遠端節點generation 小於本地,則向遠端節點傳送該endpoint資訊
34 sendAll(gDigest, deltaEpStateMap, 0);
35 }
36 else if (remoteGeneration == localGeneration)
37 {
38 /*
39 If the max remote version is greater then we request the remote endpoint send us all the data
40 for this endpoint with version greater than the max version number we have locally for this
41 endpoint.
42 If the max remote version is lesser, then we send all the data we have locally for this endpoint
43 with version greater than the max remote version.
44 */
45 //如果remoteVersion大於本地,則請求資訊,小於本地則傳送資訊
46 if (maxRemoteVersion > maxLocalVersion)
47 {
48 deltaGossipDigestList.add(new GossipDigest(gDigest.getEndpoint(), remoteGeneration, maxLocalVersion));
49 }
50 else if (maxRemoteVersion < maxLocalVersion)
51 {
52 /* send all data with generation = localgeneration and version > maxRemoteVersion */
53 sendAll(gDigest, deltaEpStateMap, maxRemoteVersion);
54 }
55 }
56 }
57 else
58 {
59 /* We are here since we have no data for this endpoint locally so request everything. */
60 requestAll(gDigest, deltaGossipDigestList, remoteGeneration);
61 }
62 }
63 }
上面方法對比版本號以後,主要處理邏輯在senall方法和requestAll方法,繼續跟進:
1 private void requestAll(GossipDigest gDigest, List<GossipDigest> deltaGossipDigestList, int remoteGeneration)
2 {
3 /* We are here since we have no data for this endpoint locally so request everthing. */
4 //生成一個Digest,等待對方節點傳送訊息
5 deltaGossipDigestList.add( new GossipDigest(gDigest.getEndpoint(), remoteGeneration, 0 ));
6 if (logger.isTraceEnabled())
7 logger.trace( "requestAll for {}" , gDigest.getEndpoint());
8 }
9 private void sendAll(GossipDigest gDigest, Map<InetAddress, EndpointState> deltaEpStateMap, int maxRemoteVersion)
10 {
11 EndpointState localEpStatePtr = getStateForVersionBiggerThan(gDigest.getEndpoint(), maxRemoteVersion);
12 if (localEpStatePtr != null )
13 //將endpintState資訊透過ack 訊息傳送給對方
14 deltaEpStateMap.put(gDigest.getEndpoint(), localEpStatePtr);
15 }
到這裡我們發現向對方節點傳送的ack訊息已經構造完成了,包含了deltaGossipDigestList(對方節點資訊最新,我們需要對方節點給我們發endpointState) 和 deltaEpStateMap(當前節點新,我們傳送給對方節點) 。
Gossip 透過GossipDigestAckVerbHandler 處理ack訊息,主要邏輯有兩塊:
1.如果deltaEpStateMap有資料,則說明需要更新本地applicationState,執行Gossiper.instance.applyStateLocally方法
2.如果deltaGossipDigestList 有資料,則說明對方節點需要更新,構造EndpointState,併傳送ack2訊息給對方
GossipDigestAck2VerbHandler 用來處理 ack2訊息,主要邏輯也在Gossiper.instance.applyStateLocally中,我們看一下Gossiper.instance.applyStateLocally的邏輯:
01 void applyStateLocally(Map<InetAddress, EndpointState> epStateMap)
02 {
03 for (Entry<InetAddress, EndpointState> entry : epStateMap.entrySet())
04 {
05 InetAddress ep = entry.getKey();
06 if ( ep.equals(FBUtilities.getBroadcastAddress()) && !isInShadowRound())
07 continue ;
08 if (justRemovedEndpoints.containsKey(ep))
09 {
10 if (logger.isTraceEnabled())
11 logger.trace( "Ignoring gossip for {} because it is quarantined" , ep);
12 continue ;
13 }
14
15 EndpointState localEpStatePtr = endpointStateMap.get(ep);
16 EndpointState remoteState = entry.getValue();
17
18 /*
19 If state does not exist just add it. If it does then add it if the remote generation is greater.
20 If there is a generation tie, attempt to break it by heartbeat version.
21 */
22 if (localEpStatePtr != null)
23 {
24 int localGeneration = localEpStatePtr.getHeartBeatState().getGeneration();
25 int remoteGeneration = remoteState.getHeartBeatState().getGeneration();
26 long localTime = System.currentTimeMillis()/1000;
27 if (logger.isTraceEnabled())
28 logger.trace("{} local generation {}, remote generation {}", ep, localGeneration, remoteGeneration);
29
30 // We measure generation drift against local time, based on the fact that generation is initialized by time
31 if (remoteGeneration > localTime + MAX_GENERATION_DIFFERENCE)
32 {
33 // assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself)
34 logger.warn("received an invalid gossip generation for peer {}; local time = {}, received generation = {}", ep, localTime, remoteGeneration);
35 }
36 else if (remoteGeneration > localGeneration)
37 {
38 if (logger.isTraceEnabled())
39 logger.trace("Updating heartbeat state generation to {} from {} for {}", remoteGeneration, localGeneration, ep);
40 // major state change will handle the update by inserting the remote state directly
41 //通知訂閱者節點狀態發生變化
42 handleMajorStateChange(ep, remoteState);
43 }
44 else if (remoteGeneration == localGeneration) // generation has not changed, apply new states
45 {
46 /* find maximum state */
47 int localMaxVersion = getMaxEndpointStateVersion(localEpStatePtr);
48 int remoteMaxVersion = getMaxEndpointStateVersion(remoteState);
49 if (remoteMaxVersion > localMaxVersion)
50 {
51 // apply states, but do not notify since there is no major change
52 //更新新的狀態,因為是version變化,不做訂閱者通知
53 applyNewStates(ep, localEpStatePtr, remoteState);
54 }
55 else if (logger.isTraceEnabled())
56 logger.trace( "Ignoring remote version {} <= {} for {}" , remoteMaxVersion, localMaxVersion, ep);
57
58 if (!localEpStatePtr.isAlive() && !isDeadState(localEpStatePtr)) // unless of course, it was dead
59 markAlive(ep, localEpStatePtr);
60 }
61 else
62 {
63 if (logger.isTraceEnabled())
64 logger.trace( "Ignoring remote generation {} < {}" , remoteGeneration, localGeneration);
65 }
66 }
67 else
68 {
69 // this is a new node, report it to the FD in case it is the first time we are seeing it AND it's not alive
70 FailureDetector.instance.report(ep);
71 //通知訂閱者有新節點加入
72 handleMajorStateChange(ep, remoteState);
73 }
74 }
75
76 boolean any30 = anyEndpointOn30();
77 if (any30 != anyNodeOn30)
78 {
79 logger.info(any30
80 ? "There is at least one 3.0 node in the cluster - will store and announce compatible schema version"
81 : "There are no 3.0 nodes in the cluster - will store and announce real schema version" );
82
83 anyNodeOn30 = any30;
84 executor.submit(Schema.instance::updateVersionAndAnnounce);
85 }
86 }
到這裡Gossip 三次握手的全過程就分析完了。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30239065/viewspace-2717314/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Gossip 協議詳解Go協議
- 啥是Gossip協議?Go協議
- Gossip協議和Grpc協議的區別Go協議RPC
- Gossip協議也叫Epidemic協議(流行病協議)Go協議IDE
- Gossip協議-推導運算分析Go協議
- 客戶案例:敏捷轉型的二三事兒敏捷
- 一萬字詳解 Redis Cluster Gossip 協議RedisGo協議
- 10.redis cluster介紹與gossip協議RedisGo協議
- ZAB協議的那些事?協議
- PD、QC、SCP、VOOC……充電協議與移動電源的那些事兒協議
- HTTP協議那些事HTTP協議
- EventLoop二三事OOP
- Go二三事Go
- JavaScript "相等" 的二三事JavaScript
- 「分散式技術專題」基於Gossip協議的去中心服務分散式Go協議
- 進來偷學一招,資料歸檔二三事兒
- LLVM二三事LVM
- App安全二三事APP
- 翻譯二三事
- Oauth2協議那些事OAuth協議
- 前端渲染過程的二三事前端
- App簽名二三事APP
- 前端正則二三事前端
- Swift 面向協議程式設計的那些事Swift協議程式設計
- CSS 和 JS 阻塞二三事CSSJS
- 程式設計師二三事程式設計師
- PNG圖片原理二三事
- 訊息佇列二三事佇列
- 跨域實踐二三事跨域
- iOS Socket.io二三事iOS
- RTSP協議、RTMP協議、HTTP協議的區別協議HTTP
- 你必須要知道的babel二三事Babel
- 關於前端架構師的二三事前端架構
- 關於分散式事務、兩階段提交協議、三階提交協議分散式協議
- webpack的那些事兒Web
- Ubuntu的那些事兒Ubuntu
- 童年生活二三事(語法)
- AFL二三事 -- 原始碼分析 1原始碼