[從原始碼學設計]螞蟻金服SOFARegistry之延遲操作

羅西的思考發表於2021-01-24

[從原始碼學設計]螞蟻金服SOFARegistry之延遲操作

0x00 摘要

SOFARegistry 是螞蟻金服開源的一個生產級、高時效、高可用的服務註冊中心。

本系列文章重點在於分析設計和架構,即利用多篇文章,從多個角度反推總結 DataServer 或者 SOFARegistry 的實現機制和架構思路,讓大家藉以學習阿里如何設計。

本文為第十七篇,介紹SOFARegistry的延遲操作。

0x01 業務領域

1.1 業務緣由

為什麼要有AfterWorkingProcess?

AfterWorkingProcess 的作用是延遲操作。猜測大致是因為某些情況下,無法執行業務,只能在後續時機進行彌補。

在官方部落格有類似論述也支援我們的判斷 :

在資料未同步完成之前,所有對新節點的讀資料操作,將轉發到擁有該資料分片的資料節點。

在資料未同步完成之前,禁止對新節點的寫資料操作,防止在資料同步過程中出現新的資料不一致情況。

1.2 學習方向

可以看到類似這種業務上延遲操作應該如何實現。

0x02 實現

2.1 定義

介面定義如下:

public interface AfterWorkingProcess {
    void afterWorkingProcess();
    int getOrder();
}

2.2 配置

這個 afterWorkProcessors 會作為 AfterWorkingProcessHandler 的成員變數進行處理。用於處理一些業務邏輯結束後的處理動作。

        @Bean(name = "afterWorkProcessors")
        public List<AfterWorkingProcess> afterWorkingProcessors() {
            List<AfterWorkingProcess> list = new ArrayList<>();
            list.add(renewDatumHandler());
            list.add(datumLeaseManager());
            list.add(disconnectEventHandler());
            list.add(notifyDataSyncHandler());
            return list;
        }

        @Bean
        public AfterWorkingProcessHandler afterWorkingProcessHandler() {
            return new AfterWorkingProcessHandler();
        }

2.3 引擎

這裡用法比較少見。AfterWorkingProcessHandler 也是 AfterWorkingProcess 的實現類

在其 afterWorkingProcess 函式中,會對 Bean afterWorkingProcessors 中間註冊的實現類一一呼叫其 afterWorkingProcess 業務函式。

其中,getOrder 會指定執行優先順序,這是一個常見套路。

public class AfterWorkingProcessHandler implements AfterWorkingProcess {

    @Resource(name = "afterWorkProcessors")
    private List<AfterWorkingProcess> afterWorkingProcessors;

    @Override
    public void afterWorkingProcess() {

        if(afterWorkingProcessors != null){
            List<AfterWorkingProcess> list = afterWorkingProcessors.stream().sorted(Comparator.comparing(AfterWorkingProcess::getOrder)).collect(Collectors.toList());

            list.forEach(AfterWorkingProcess::afterWorkingProcess);
        }
    }

    @Override
    public int getOrder() {
        return 0;
    }
}

2.4 呼叫

只有在 DataServerCache # updateDataServerStatus 函式中有呼叫:

afterWorkingProcessHandler.afterWorkingProcess();

而在 DataServerCache 中有如下函式都會呼叫到 updateDataServerStatus:

  • synced
  • notifiedAll
  • checkAndUpdateStatus
  • addNotWorkingServer

圖示如下:

+------------------------------------------+
| DataServerCache                          |                                 +----------------------------------------------+
|                                          |                                 |   AfterWorkingProcess                        |
| synced +----------------------+          |                                 |                                              |
|                               |          | +----------------------------+  | +------------------------------------------+ |
|                               |          | | AfterWorkingProcessHandler |  | |renewDatumHandler.afterWorkingProcess     | |
|                               |          | |                            |  | |                                          | |
|                               v          | |                            |  | |datumLeaseManager.afterWorkingProcess     | |
| notifiedAll +--->updateDataServerStatus +------> afterWorkingProcess +------>+                                          | |
|                                 ^   ^    | |                            |  | |disconnectEventHandler.afterWorkingProcess| |
|                                 |   |    | +----------------------------+  | |                                          | |
|                                 |   |    |                                 | |notifyDataSyncHandler.afterWorkingProcess | |
| checkAndUpdateStatus+-----------+   |    |                                 | +------------------------------------------+ |
|                                     |    |                                 +----------------------------------------------+
| addNotWorkingServer +---------------+    |
|                                          |
+------------------------------------------+

手機如下:

因為是業務關聯,所以不需要什麼定時,非同步之類。

2.5 業務實現

2.5.1 DisconnectEventHandler

public class DisconnectEventHandler implements InitializingBean, AfterWorkingProcess {
    /**
     * a DelayQueue that contains client disconnect events
     */
    private final DelayQueue<DisconnectEvent>           EVENT_QUEUE        = new DelayQueue<>();

    @Autowired
    private SessionServerConnectionFactory              sessionServerConnectionFactory;

    @Autowired
    private DataChangeEventCenter                       dataChangeEventCenter;

    @Autowired
    private DataServerConfig                            dataServerConfig;

    @Autowired
    private DataNodeStatus                              dataNodeStatus;

    private static final int                            BLOCK_FOR_ALL_SYNC = 5000;

    private static final BlockingQueue<DisconnectEvent> noWorkQueue        = new LinkedBlockingQueue<>();
}

在receive的正常業務操作中,如果發現本身狀態不是 WORKING,則把event放入 BlockingQueue 之中。

public void receive(DisconnectEvent event) {
        if (event.getType() == DisconnectTypeEnum.SESSION_SERVER) {
            SessionServerDisconnectEvent sessionServerDisconnectEvent = (SessionServerDisconnectEvent) event;
                sessionServerDisconnectEvent.getProcessId());
        } else if (event.getType() == DisconnectTypeEnum.CLIENT) {
            ClientDisconnectEvent clientDisconnectEvent = (ClientDisconnectEvent) event;
        }

        if (dataNodeStatus.getStatus() != LocalServerStatusEnum.WORKING) {
            noWorkQueue.add(event);
            return;
        }
        EVENT_QUEUE.add(event);
}

當時機來到時候,系統再次呼叫afterWorkingProcess。這裡會始終Block在noWorkQueue上,如果不為空,則會執行請求。

public void afterWorkingProcess() {
    try {
        /*
         * After the snapshot data is synchronized during startup, it is queued and then placed asynchronously into
         * DatumCache. When the notification becomes WORKING, there may be data in the queue that is not executed
         * to DatumCache. So it need to sleep for a while.
         */
        TimeUnit.MILLISECONDS.sleep(BLOCK_FOR_ALL_SYNC);

        while (!noWorkQueue.isEmpty()) {
            DisconnectEvent event = noWorkQueue.poll(1, TimeUnit.SECONDS);
            if (event != null) {
                receive(event);
            }
        }
    } 
}

圖示如下:

+----------------------------------------------------------+
|                                  DisconnectEventHandler  |
|    +-------------------------+                           |
|    | receive                 |                           |
|    |                         |  NOT WORKING              |
|    | dataNodeStatus.getStatus+---------------+           |
|    |            +            |               |           |
|    |            | WORKING    |               | add       |
|    |            |            |               |           |
|    |            v            |               |           |
|    |  EVENT_QUEUE.add(event) |               |           |
|    |                         |           +---v---------+ |
|    +-------------------------+           |             | |
|                                          | noWorkQueue | |
|                                          |             | |
|    +-----------------------+             +-----+-------+ |
|    | afterWorkingProcess   |                   |         |
|    |                       |                   | poll    |
|    |                       |      NOT isEmpty  |         |
|    |     receive(event) <----------------------+         |
|    |                       |                             |
|    |                       |                             |
|    +-----------------------+                             |
+----------------------------------------------------------+

2.5.2 NotifyDataSyncHandler

DisconnectEventHandler 和 NotifyDataSyncHandler 的實現類似。

依託一個 LinkedBlockingQueue 做快取queue。

public class NotifyDataSyncHandler extends AbstractClientHandler<NotifyDataSyncRequest> implements AfterWorkingProcess {
  
  private static final BlockingQueue<SyncDataRequestForWorking> noWorkQueue = new LinkedBlockingQueue<>();
  
}

在doHandle的正常業務操作中,如果發現本身狀態不是 WORKING,則用業務邏輯SyncDataRequestForWorking 構建一個訊息 SyncDataRequestForWorking,放入 LinkedBlockingQueue 之中。

@Override
public Object doHandle(Channel channel, NotifyDataSyncRequest request) {
        final Connection connection = ((BoltChannel) channel).getConnection();
        if (dataNodeStatus.getStatus() != LocalServerStatusEnum.WORKING) {
            noWorkQueue.add(new SyncDataRequestForWorking(connection, request));
            return CommonResponse.buildSuccessResponse();
        }
        executorRequest(connection, request);
        return CommonResponse.buildSuccessResponse();
}

當時機來到時候,系統再次呼叫afterWorkingProcess。這裡會始終Block在noWorkQueue上,如果不為空,則會執行請求。

@Override
public void afterWorkingProcess() {
            while (!noWorkQueue.isEmpty()) {
                SyncDataRequestForWorking event = noWorkQueue.poll(1, TimeUnit.SECONDS);
                if (event != null) {
                    executorRequest(event.getConnection(), event.getRequest());
                }
            }
        } 
}

圖示如下:

+----------------------------------------------------------+
|                                   NotifyDataSyncHandler  |
|    +-------------------------+                           |
|    | doHandle                |                           |
|    |                         |  NOT WORKING              |
|    | dataNodeStatus.getStatus+---------------+           |
|    |            +            |               |           |
|    |            | WORKING    |               | add       |
|    |            |            |               |           |
|    |            v            |               |           |
|    |     executorRequest     |               |           |
|    |                         |           +---v---------+ |
|    +-------------------------+           |             | |
|                                          | noWorkQueue | |
|                                          |             | |
|    +-----------------------+             +-----+-------+ |
|    | afterWorkingProcess   |                   |         |
|    |                       |                   | poll    |
|    |                       |      NOT isEmpty  |         |
|    |   executorRequest  <----------------------+         |
|    |                       |                             |
|    |                       |                             |
|    +-----------------------+                             |
+----------------------------------------------------------+

2.5.3 RenewDatumHandler

RenewDatumHandler 同 DatumLeaseManager 這兩者很類似。並沒有使用queue,只是提交一個執行緒。

其實現目的在註釋中寫的很清楚:

/* * After the snapshot data is synchronized during startup, it is queued and then placed asynchronously into * DatumCache. When the notification becomes WORKING, there may be data in the queue that is not executed * to DatumCache. So it need to sleep for a while. */

但是細節又有所不同,這兩個類是同一個作者,懷疑此君在實驗比較兩種不同實現方式。

RenewDatumHandler 基於 ThreadPoolExecutorDataServer 來實現。

public class RenewDatumHandler extends AbstractServerHandler<RenewDatumRequest> implements
                                                                               AfterWorkingProcess {

    @Autowired
    private ThreadPoolExecutor  renewDatumProcessorExecutor;

}

renewDatumProcessorExecutor 是一個Bean,具體程式碼如下,ArrayBlockingQueue:是一個基於陣列結構的有界阻塞佇列,按FIFO原則進行排序。

@Bean(name = "renewDatumProcessorExecutor")
public ThreadPoolExecutor renewDatumProcessorExecutor(DataServerConfig dataServerConfig) {
            return new ThreadPoolExecutorDataServer("RenewDatumProcessorExecutor",
                dataServerConfig.getRenewDatumExecutorMinPoolSize(),
                dataServerConfig.getRenewDatumExecutorMaxPoolSize(), 300, TimeUnit.SECONDS,
                new ArrayBlockingQueue<>(dataServerConfig.getRenewDatumExecutorQueueSize()),
                new NamedThreadFactory("DataServer-RenewDatumProcessor-executor", true));
}

ThreadPoolExecutorDataServer 主要程式碼如下,就是簡單繼承了ThreadPoolExecutor,估計這裡後續會有新功能新增,現在只是佔坑:

public class ThreadPoolExecutorDataServer extends ThreadPoolExecutor {
    @Override
    public void execute(Runnable command) {
		super.execute(command);
    }
}

對於afterWorkingProcess,就是提交了一個執行緒,其業務是:等待一段時間,然後設定renewEnabled。

@Override
public void afterWorkingProcess() {
        renewDatumProcessorExecutor.submit(() -> {
            TimeUnit.MILLISECONDS.sleep(dataServerConfig.getRenewEnableDelaySec());
            renewEnabled.set(true);
        });
}

0xFF 參考

螞蟻金服服務註冊中心如何實現 DataServer 平滑擴縮容

螞蟻金服服務註冊中心 SOFARegistry 解析 | 服務發現優化之路

服務註冊中心 Session 儲存策略 | SOFARegistry 解析

海量資料下的註冊中心 - SOFARegistry 架構介紹

服務註冊中心資料分片和同步方案詳解 | SOFARegistry 解析

螞蟻金服開源通訊框架SOFABolt解析之連線管理剖析

螞蟻金服開源通訊框架SOFABolt解析之超時控制機制及心跳機制

螞蟻金服開源通訊框架 SOFABolt 協議框架解析

螞蟻金服服務註冊中心資料一致性方案分析 | SOFARegistry 解析

螞蟻通訊框架實踐

sofa-bolt 遠端呼叫

sofa-bolt學習

SOFABolt 設計總結 - 優雅簡潔的設計之道

SofaBolt原始碼分析-服務啟動到訊息處理

SOFABolt 原始碼分析

SOFABolt 原始碼分析9 - UserProcessor 自定義處理器的設計

SOFARegistry 介紹

SOFABolt 原始碼分析13 - Connection 事件處理機制的設計

相關文章