[從原始碼學設計]螞蟻金服SOFARegistry之延遲操作
0x00 摘要
SOFARegistry 是螞蟻金服開源的一個生產級、高時效、高可用的服務註冊中心。
本系列文章重點在於分析設計和架構,即利用多篇文章,從多個角度反推總結 DataServer 或者 SOFARegistry 的實現機制和架構思路,讓大家藉以學習阿里如何設計。
本文為第十七篇,介紹SOFARegistry的延遲操作。
0x01 業務領域
1.1 業務緣由
為什麼要有AfterWorkingProcess?
AfterWorkingProcess 的作用是延遲操作。猜測大致是因為某些情況下,無法執行業務,只能在後續時機進行彌補。
在官方部落格有類似論述也支援我們的判斷 :
在資料未同步完成之前,所有對新節點的讀資料操作,將轉發到擁有該資料分片的資料節點。
在資料未同步完成之前,禁止對新節點的寫資料操作,防止在資料同步過程中出現新的資料不一致情況。
1.2 學習方向
可以看到類似這種業務上延遲操作應該如何實現。
0x02 實現
2.1 定義
介面定義如下:
public interface AfterWorkingProcess {
void afterWorkingProcess();
int getOrder();
}
2.2 配置
這個 afterWorkProcessors 會作為 AfterWorkingProcessHandler 的成員變數進行處理。用於處理一些業務邏輯結束後的處理動作。
@Bean(name = "afterWorkProcessors")
public List<AfterWorkingProcess> afterWorkingProcessors() {
List<AfterWorkingProcess> list = new ArrayList<>();
list.add(renewDatumHandler());
list.add(datumLeaseManager());
list.add(disconnectEventHandler());
list.add(notifyDataSyncHandler());
return list;
}
@Bean
public AfterWorkingProcessHandler afterWorkingProcessHandler() {
return new AfterWorkingProcessHandler();
}
2.3 引擎
這裡用法比較少見。AfterWorkingProcessHandler 也是 AfterWorkingProcess 的實現類。
在其 afterWorkingProcess 函式中,會對 Bean afterWorkingProcessors 中間註冊的實現類一一呼叫其 afterWorkingProcess 業務函式。
其中,getOrder 會指定執行優先順序,這是一個常見套路。
public class AfterWorkingProcessHandler implements AfterWorkingProcess {
@Resource(name = "afterWorkProcessors")
private List<AfterWorkingProcess> afterWorkingProcessors;
@Override
public void afterWorkingProcess() {
if(afterWorkingProcessors != null){
List<AfterWorkingProcess> list = afterWorkingProcessors.stream().sorted(Comparator.comparing(AfterWorkingProcess::getOrder)).collect(Collectors.toList());
list.forEach(AfterWorkingProcess::afterWorkingProcess);
}
}
@Override
public int getOrder() {
return 0;
}
}
2.4 呼叫
只有在 DataServerCache # updateDataServerStatus 函式中有呼叫:
afterWorkingProcessHandler.afterWorkingProcess();
而在 DataServerCache 中有如下函式都會呼叫到 updateDataServerStatus:
- synced
- notifiedAll
- checkAndUpdateStatus
- addNotWorkingServer
圖示如下:
+------------------------------------------+
| DataServerCache | +----------------------------------------------+
| | | AfterWorkingProcess |
| synced +----------------------+ | | |
| | | +----------------------------+ | +------------------------------------------+ |
| | | | AfterWorkingProcessHandler | | |renewDatumHandler.afterWorkingProcess | |
| | | | | | | | |
| v | | | | |datumLeaseManager.afterWorkingProcess | |
| notifiedAll +--->updateDataServerStatus +------> afterWorkingProcess +------>+ | |
| ^ ^ | | | | |disconnectEventHandler.afterWorkingProcess| |
| | | | +----------------------------+ | | | |
| | | | | |notifyDataSyncHandler.afterWorkingProcess | |
| checkAndUpdateStatus+-----------+ | | | +------------------------------------------+ |
| | | +----------------------------------------------+
| addNotWorkingServer +---------------+ |
| |
+------------------------------------------+
手機如下:
因為是業務關聯,所以不需要什麼定時,非同步之類。
2.5 業務實現
2.5.1 DisconnectEventHandler
public class DisconnectEventHandler implements InitializingBean, AfterWorkingProcess {
/**
* a DelayQueue that contains client disconnect events
*/
private final DelayQueue<DisconnectEvent> EVENT_QUEUE = new DelayQueue<>();
@Autowired
private SessionServerConnectionFactory sessionServerConnectionFactory;
@Autowired
private DataChangeEventCenter dataChangeEventCenter;
@Autowired
private DataServerConfig dataServerConfig;
@Autowired
private DataNodeStatus dataNodeStatus;
private static final int BLOCK_FOR_ALL_SYNC = 5000;
private static final BlockingQueue<DisconnectEvent> noWorkQueue = new LinkedBlockingQueue<>();
}
在receive的正常業務操作中,如果發現本身狀態不是 WORKING,則把event放入 BlockingQueue 之中。
public void receive(DisconnectEvent event) {
if (event.getType() == DisconnectTypeEnum.SESSION_SERVER) {
SessionServerDisconnectEvent sessionServerDisconnectEvent = (SessionServerDisconnectEvent) event;
sessionServerDisconnectEvent.getProcessId());
} else if (event.getType() == DisconnectTypeEnum.CLIENT) {
ClientDisconnectEvent clientDisconnectEvent = (ClientDisconnectEvent) event;
}
if (dataNodeStatus.getStatus() != LocalServerStatusEnum.WORKING) {
noWorkQueue.add(event);
return;
}
EVENT_QUEUE.add(event);
}
當時機來到時候,系統再次呼叫afterWorkingProcess。這裡會始終Block在noWorkQueue上,如果不為空,則會執行請求。
public void afterWorkingProcess() {
try {
/*
* After the snapshot data is synchronized during startup, it is queued and then placed asynchronously into
* DatumCache. When the notification becomes WORKING, there may be data in the queue that is not executed
* to DatumCache. So it need to sleep for a while.
*/
TimeUnit.MILLISECONDS.sleep(BLOCK_FOR_ALL_SYNC);
while (!noWorkQueue.isEmpty()) {
DisconnectEvent event = noWorkQueue.poll(1, TimeUnit.SECONDS);
if (event != null) {
receive(event);
}
}
}
}
圖示如下:
+----------------------------------------------------------+
| DisconnectEventHandler |
| +-------------------------+ |
| | receive | |
| | | NOT WORKING |
| | dataNodeStatus.getStatus+---------------+ |
| | + | | |
| | | WORKING | | add |
| | | | | |
| | v | | |
| | EVENT_QUEUE.add(event) | | |
| | | +---v---------+ |
| +-------------------------+ | | |
| | noWorkQueue | |
| | | |
| +-----------------------+ +-----+-------+ |
| | afterWorkingProcess | | |
| | | | poll |
| | | NOT isEmpty | |
| | receive(event) <----------------------+ |
| | | |
| | | |
| +-----------------------+ |
+----------------------------------------------------------+
2.5.2 NotifyDataSyncHandler
DisconnectEventHandler 和 NotifyDataSyncHandler 的實現類似。
依託一個 LinkedBlockingQueue 做快取queue。
public class NotifyDataSyncHandler extends AbstractClientHandler<NotifyDataSyncRequest> implements AfterWorkingProcess {
private static final BlockingQueue<SyncDataRequestForWorking> noWorkQueue = new LinkedBlockingQueue<>();
}
在doHandle的正常業務操作中,如果發現本身狀態不是 WORKING,則用業務邏輯SyncDataRequestForWorking 構建一個訊息 SyncDataRequestForWorking,放入 LinkedBlockingQueue 之中。
@Override
public Object doHandle(Channel channel, NotifyDataSyncRequest request) {
final Connection connection = ((BoltChannel) channel).getConnection();
if (dataNodeStatus.getStatus() != LocalServerStatusEnum.WORKING) {
noWorkQueue.add(new SyncDataRequestForWorking(connection, request));
return CommonResponse.buildSuccessResponse();
}
executorRequest(connection, request);
return CommonResponse.buildSuccessResponse();
}
當時機來到時候,系統再次呼叫afterWorkingProcess。這裡會始終Block在noWorkQueue上,如果不為空,則會執行請求。
@Override
public void afterWorkingProcess() {
while (!noWorkQueue.isEmpty()) {
SyncDataRequestForWorking event = noWorkQueue.poll(1, TimeUnit.SECONDS);
if (event != null) {
executorRequest(event.getConnection(), event.getRequest());
}
}
}
}
圖示如下:
+----------------------------------------------------------+
| NotifyDataSyncHandler |
| +-------------------------+ |
| | doHandle | |
| | | NOT WORKING |
| | dataNodeStatus.getStatus+---------------+ |
| | + | | |
| | | WORKING | | add |
| | | | | |
| | v | | |
| | executorRequest | | |
| | | +---v---------+ |
| +-------------------------+ | | |
| | noWorkQueue | |
| | | |
| +-----------------------+ +-----+-------+ |
| | afterWorkingProcess | | |
| | | | poll |
| | | NOT isEmpty | |
| | executorRequest <----------------------+ |
| | | |
| | | |
| +-----------------------+ |
+----------------------------------------------------------+
2.5.3 RenewDatumHandler
RenewDatumHandler 同 DatumLeaseManager 這兩者很類似。並沒有使用queue,只是提交一個執行緒。
其實現目的在註釋中寫的很清楚:
/* * After the snapshot data is synchronized during startup, it is queued and then placed asynchronously into * DatumCache. When the notification becomes WORKING, there may be data in the queue that is not executed * to DatumCache. So it need to sleep for a while. */
但是細節又有所不同,這兩個類是同一個作者,懷疑此君在實驗比較兩種不同實現方式。
RenewDatumHandler 基於 ThreadPoolExecutorDataServer 來實現。
public class RenewDatumHandler extends AbstractServerHandler<RenewDatumRequest> implements
AfterWorkingProcess {
@Autowired
private ThreadPoolExecutor renewDatumProcessorExecutor;
}
renewDatumProcessorExecutor 是一個Bean,具體程式碼如下,ArrayBlockingQueue:是一個基於陣列結構的有界阻塞佇列,按FIFO原則進行排序。
@Bean(name = "renewDatumProcessorExecutor")
public ThreadPoolExecutor renewDatumProcessorExecutor(DataServerConfig dataServerConfig) {
return new ThreadPoolExecutorDataServer("RenewDatumProcessorExecutor",
dataServerConfig.getRenewDatumExecutorMinPoolSize(),
dataServerConfig.getRenewDatumExecutorMaxPoolSize(), 300, TimeUnit.SECONDS,
new ArrayBlockingQueue<>(dataServerConfig.getRenewDatumExecutorQueueSize()),
new NamedThreadFactory("DataServer-RenewDatumProcessor-executor", true));
}
ThreadPoolExecutorDataServer 主要程式碼如下,就是簡單繼承了ThreadPoolExecutor,估計這裡後續會有新功能新增,現在只是佔坑:
public class ThreadPoolExecutorDataServer extends ThreadPoolExecutor {
@Override
public void execute(Runnable command) {
super.execute(command);
}
}
對於afterWorkingProcess,就是提交了一個執行緒,其業務是:等待一段時間,然後設定renewEnabled。
@Override
public void afterWorkingProcess() {
renewDatumProcessorExecutor.submit(() -> {
TimeUnit.MILLISECONDS.sleep(dataServerConfig.getRenewEnableDelaySec());
renewEnabled.set(true);
});
}
0xFF 參考
螞蟻金服服務註冊中心如何實現 DataServer 平滑擴縮容
螞蟻金服服務註冊中心 SOFARegistry 解析 | 服務發現優化之路
服務註冊中心 Session 儲存策略 | SOFARegistry 解析
海量資料下的註冊中心 - SOFARegistry 架構介紹
服務註冊中心資料分片和同步方案詳解 | SOFARegistry 解析
螞蟻金服開源通訊框架SOFABolt解析之超時控制機制及心跳機制
螞蟻金服服務註冊中心資料一致性方案分析 | SOFARegistry 解析