本文首發於泊浮目的簡書:https://www.jianshu.com/u/204...

版本	日期	備註
1.0	2021.10.8	文章首發

0. 前言

前陣子筆者涉及了些許監控相關的開發工作，在開發過程中也碰到過些許問題，便翻讀了FLink相關部分的程式碼，在讀程式碼的過程中發現了一些好的設計，因此也是寫成文章整理上來。

本文的原始碼基於FLink1.13.2。

1. 擴充套件外掛化

在官網中，FLink社群自己提供了一些已接入的Repoter，如果我們有自己定製的Reporter，也可以根據它的規範去實現自己的Repoter。

在FLink的程式碼中，提供了反射機制例項化MetricReporter：要求MetricReporter的實現類必須是public的訪問修飾符，不能是抽象類，必須有一個無參建構函式。

核心程式碼為RepoterSetup#getAllReporterFactories：

    private static Iterator<MetricReporterFactory> getAllReporterFactories(
            @Nullable PluginManager pluginManager) {
        final Iterator<MetricReporterFactory> factoryIteratorSPI =
                ServiceLoader.load(MetricReporterFactory.class).iterator();
        final Iterator<MetricReporterFactory> factoryIteratorPlugins =
                pluginManager != null
                        ? pluginManager.load(MetricReporterFactory.class)
                        : Collections.emptyIterator();

        return Iterators.concat(factoryIteratorPlugins, factoryIteratorSPI);
    }

該程式碼會通過Java的SPI機制來獲取MetricReporter的相關實現類，本質上是通過ClassLoder來獲取。

|-- ReporterSetup
     \-- fromConfiguration //當叢集啟動時，會從配置讀取監控並初始化相關類
         \-- loadAvailableReporterFactories // 載入有效的Reporter們
             \-- getAllReporterFactories //  核心程式碼，通過SPI以及ClassLoader機制獲取Repoter們

2. 內建鬆耦合

上文提到了社群會提供常見的一些監控Repoter。在程式碼中，本質是工廠模式的實現。

/**
 * {@link MetricReporter} factory.
 *
 * <p>Reporters that can be instantiated with a factory automatically qualify for being loaded as a
 * plugin, so long as the reporter jar is self-contained (excluding Flink dependencies) and contains
 * a {@code META-INF/services/org.apache.flink.metrics.reporter.MetricReporterFactory} file
 * containing the qualified class name of the factory.
 *
 * <p>Reporters that previously relied on reflection for instantiation can use the {@link
 * InstantiateViaFactory} annotation to redirect reflection-base instantiation attempts to the
 * factory instead.
 */
public interface MetricReporterFactory {

    /**
     * Creates a new metric reporter.
     *
     * @param properties configured properties for the reporter
     * @return created metric reporter
     */
    MetricReporter createMetricReporter(final Properties properties);
}

每接入一個監控，只要實現相應的工廠方法即可。目前實現的有：

org.apache.flink.metrics.graphite.GraphiteReporterFactory
org.apache.flink.metrics.influxdb.InfluxdbReporterFactory
org.apache.flink.metrics.prometheus.PrometheusReporter
org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
org.apache.flink.metrics.statsd.StatsDReporterFactory
org.apache.flink.metrics.datadog.DatadogHttpReporterFactory
org.apache.flink.metrics.slf4j.Slf4jReporterFactory

每當社群需要接入新的Repoter時，僅僅需要實現MetricReporterFactory 即可，而上層能感知到的也僅僅是MetricReporter ，和任何具體實現無關，這也是典型的一種防腐設計。

3. Fail safe

在流計算業務中，如果監控這種旁路邏輯發生問題，是否應該影響到主幹邏輯呢？答案是不應該的。

在MetricRegistryImpl中（顧名思義，它會將所有的Repoter註冊進這個類），建構函式會將相關的MetricReporter放到執行緒池中，定期的讓它們上報資料。

|-- MetricRegistryImpl
  \-- constructor

在WebMonitorEndpoint中，也有執行緒池的身影。這個類提供了RestAPI來便於查詢Metric。對於其他元件的請求通過Akka來非同步傳送，並通過執行緒池來處理這些回撥的回覆。

|-- WebMonitorEndpoint
  \-- start
    \-- initializeHandlers
      \--   new JobConfigHandler
|-- AbstractExecutionGraphHandler
  \-- handleRequest

這是典型Fail-safe的設計。

4. 不僅只支援Push

在FLink中，監控資料不僅支援Push，同時還實現了Pull，而實現也非常的簡單。

MetricQueryService實現了MetricQueryServiceGateway，這意味著它可以被遠端呼叫。

其監控資料來原始碼追蹤：

|-- AbstractMetricGroup
  \-- counter
    |-- MetricRegistryImpl
      \-- register
        |-- MetricQueryService
          \-- addMetric

上面提到的WebMonitorEndpoint也是一樣，不過是基於RestAPI的實現，同樣提供了Pull的策略。

讀Flink原始碼談設計：Metric

0. 前言

1. 擴充套件外掛化

2. 內建鬆耦合

3. Fail safe

4. 不僅只支援Push

5. 參考資料

相關文章