Storm-原始碼分析-metric

寒凝雪發表於2017-05-02

首先定義一系列metric相關的interface, IMetric, IReducer, ICombiner (backtype.storm.metric.api)

在task中, 建立一系列builtin-metrics, (backtype.storm.daemon.builtin-metrics), 並註冊到topology context裡面

task會不斷的利用如spout-acked-tuple!的functions去更新這些builtin-metrics

task會定期將builtin-metrics裡面的統計資料通過METRICS-STREAM傳送給metric-bolt (backtype.storm.metric.MetricsConsumerBolt, 該bolt會建立實現backtype.storm.metric.api.IMetricsConsumer的物件, 用於計算出metrics) 
然後如何使用這些metrics? 
由於這是builtin metrics, 是不會被外界使用的 
如果處理這些metrics, 取決於_metricsConsumer.handleDataPoints, 這裡的_metricsConsumer是通過topology`s configuration配置的 
比如backtype.storm.metric.LoggingMetricsConsumer, 如果使用這個consumer就會將metrics寫入log中

 

1. backtype.storm.metric.api

IMetric

package backtype.storm.metric.api;
public interface IMetric {
    public Object getValueAndReset(); ;;取得當前值並恢復初始狀態
}

CountMetric, 計數, reset時清零 
AssignableMetric, 賦值, 不用reset 
MultiCountMetric, 使用hashmap記錄多個count, reset時分別對每個count物件呼叫getValueAndReset

public class CountMetric implements IMetric {
    long _value = 0;

    public CountMetric() {
    }
    
    public void incr() {
        _value++;
    }

    public void incrBy(long incrementBy) {
        _value += incrementBy;
    }

    public Object getValueAndReset() {
        long ret = _value;
        _value = 0;
        return ret;
    }
}

 

ICombiner

public interface ICombiner<T> {
    public T identity();
    public T combine(T a, T b);
}

CombinedMetric, 結合ICombiner和IMetric

public class CombinedMetric implements IMetric {
    private final ICombiner _combiner;
    private Object _value;

    public CombinedMetric(ICombiner combiner) {
        _combiner = combiner;
        _value = _combiner.identity();
    }
    
    public void update(Object value) {
        _value = _combiner.combine(_value, value);
    }

    public Object getValueAndReset() {
        Object ret = _value;
        _value = _combiner.identity();
        return ret;
    }
}

 

IReducer

public interface IReducer<T> { 
    T init(); 
    T reduce(T accumulator, Object input); 
    Object extractResult(T accumulator); 
}

實現IReducer介面, 實現平均數Reducer, reduce裡面累加和計數, extractResult裡面acc/count求平均數

class MeanReducerState {
    public int count = 0;
    public double sum = 0.0;
}

public class MeanReducer implements IReducer<MeanReducerState> {
    public MeanReducerState init() {
        return new MeanReducerState();
    }

    public MeanReducerState reduce(MeanReducerState acc, Object input) {
        acc.count++;
        if(input instanceof Double) {
            acc.sum += (Double)input;
        } else if(input instanceof Long) {
            acc.sum += ((Long)input).doubleValue();
        } else if(input instanceof Integer) {
            acc.sum += ((Integer)input).doubleValue();
        } else {
            throw new RuntimeException(
                "MeanReducer::reduce called with unsupported input type `" + input.getClass()
                + "`. Supported types are Double, Long, Integer.");
        }
        return acc;
    }

    public Object extractResult(MeanReducerState acc) {
        if(acc.count > 0) {
            return new Double(acc.sum / (double)acc.count);
        } else {
            return null;
        }
    }
}

 

ReducedMetric

結合IReducer和IMetric

public class ReducedMetric implements IMetric {
    private final IReducer _reducer;
    private Object _accumulator;

    public ReducedMetric(IReducer reducer) {
        _reducer = reducer;
        _accumulator = _reducer.init();
    }

    public void update(Object value) {
        _accumulator = _reducer.reduce(_accumulator, value);
    }

    public Object getValueAndReset() {
        Object ret = _reducer.extractResult(_accumulator);
        _accumulator = _reducer.init();
        return ret;
    }
}

 

IMetricsConsumer

這個interface, 內嵌TaskInfo和DataPoint類 
handleDataPoints, 新增邏輯以處理task對應的一系列DataPoint

public interface IMetricsConsumer {
    public static class TaskInfo {
        public TaskInfo() {}
        public TaskInfo(String srcWorkerHost, int srcWorkerPort, String srcComponentId, int srcTaskId, long timestamp, int updateIntervalSecs) {
            this.srcWorkerHost = srcWorkerHost;
            this.srcWorkerPort = srcWorkerPort;
            this.srcComponentId = srcComponentId; 
            this.srcTaskId = srcTaskId; 
            this.timestamp = timestamp;
            this.updateIntervalSecs = updateIntervalSecs; 
        }
        public String srcWorkerHost;
        public int srcWorkerPort;
        public String srcComponentId; 
        public int srcTaskId; 
        public long timestamp;
        public int updateIntervalSecs; 
    }
    public static class DataPoint {
        public DataPoint() {}
        public DataPoint(String name, Object value) {
            this.name = name;
            this.value = value;
        }
        @Override
        public String toString() {
            return "[" + name + " = " + value + "]";
        }
        public String name; 
        public Object value;
    }

    void prepare(Map stormConf, Object registrationArgument, TopologyContext context, IErrorReporter errorReporter);
    void handleDataPoints(TaskInfo taskInfo, Collection<DataPoint> dataPoints);
    void cleanup();
}

 

2. backtype.storm.daemon.builtin-metrics

定義Spout和Bolt所需要的一些metric, 主要兩個record, BuiltinSpoutMetrics和BuiltinBoltMetrics, [metric-name, metric-object]的hashmap

(defrecord BuiltinSpoutMetrics [^MultiCountMetric ack-count                                
                                ^MultiReducedMetric complete-latency
                                ^MultiCountMetric fail-count
                                ^MultiCountMetric emit-count
                                ^MultiCountMetric transfer-count])
(defrecord BuiltinBoltMetrics [^MultiCountMetric ack-count
                               ^MultiReducedMetric process-latency
                               ^MultiCountMetric fail-count
                               ^MultiCountMetric execute-count
                               ^MultiReducedMetric execute-latency
                               ^MultiCountMetric emit-count
                               ^MultiCountMetric transfer-count])

(defn make-data [executor-type]
  (condp = executor-type
    :spout (BuiltinSpoutMetrics. (MultiCountMetric.)
                                 (MultiReducedMetric. (MeanReducer.))
                                 (MultiCountMetric.)
                                 (MultiCountMetric.)
                                 (MultiCountMetric.))
    :bolt (BuiltinBoltMetrics. (MultiCountMetric.)
                               (MultiReducedMetric. (MeanReducer.))
                               (MultiCountMetric.)
                               (MultiCountMetric.)
                               (MultiReducedMetric. (MeanReducer.))
                               (MultiCountMetric.)
                               (MultiCountMetric.))))

(defn register-all [builtin-metrics  storm-conf topology-context]
  (doseq [[kw imetric] builtin-metrics]
    (.registerMetric topology-context (str "__" (name kw)) imetric
                     (int (get storm-conf Config/TOPOLOGY_BUILTIN_METRICS_BUCKET_SIZE_SECS)))))

在mk-task-data的時候, 呼叫make-data來建立相應的metrics,

:builtin-metrics (builtin-metrics/make-data (:type executor-data))

並在executor的mk-threads中, 會將這些builtin-metrics註冊到topologycontext中去,

(builtin-metrics/register-all (:builtin-metrics task-data) storm-conf (:user-context task-data))

上面完成的builtin-metrics的建立和註冊, 接著定義了一系列用於更新metrics的functions,

以spout-acked-tuple!為例, 需要更新MultiCountMetric ack-count和MultiReducedMetric complete-latency 
.scope從MultiCountMetric取出某個CountMetric, 然後incrBy來將stats的rate增加到count上

(defn spout-acked-tuple! [^BuiltinSpoutMetrics m stats stream latency-ms]  
  (-> m .ack-count (.scope stream) (.incrBy (stats-rate stats)))
  (-> m .complete-latency (.scope stream) (.update latency-ms)))

 

3. backtype.storm.metric

MetricsConsumerBolt

建立實現IMetricsConsumer的物件, 並在execute裡面呼叫handleDataPoints

package backtype.storm.metric;
public class MetricsConsumerBolt implements IBolt {
    IMetricsConsumer _metricsConsumer;
    String _consumerClassName;
    OutputCollector _collector;
    Object _registrationArgument;

    public MetricsConsumerBolt(String consumerClassName, Object registrationArgument) {
        _consumerClassName = consumerClassName;
        _registrationArgument = registrationArgument;
    }

    @Override
    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        try {
            _metricsConsumer = (IMetricsConsumer)Class.forName(_consumerClassName).newInstance();
        } catch (Exception e) {
            throw new RuntimeException("Could not instantiate a class listed in config under section " +
                Config.TOPOLOGY_METRICS_CONSUMER_REGISTER + " with fully qualified name " + _consumerClassName, e);
        }
        _metricsConsumer.prepare(stormConf, _registrationArgument, context, (IErrorReporter)collector);
        _collector = collector;
    }
    
    @Override
    public void execute(Tuple input) {
        _metricsConsumer.handleDataPoints((IMetricsConsumer.TaskInfo)input.getValue(0), (Collection)input.getValue(1));
        _collector.ack(input);
    }

    @Override
    public void cleanup() {
        _metricsConsumer.cleanup();
    }
    
}

 

SystemBolt

SystemBolt, 根據comments裡面說的, 每個worker都有一個, taskid=-1 
定義些system相關的metric, 並註冊到topologycontext裡面

 

需要使用Java呼叫clojure, 所以需要import下面的package

import clojure.lang.AFn;
import clojure.lang.IFn; //funtion
import clojure.lang.RT;  //run-time

並且用到些用於監控memory和JVM的java package

java.lang.management.MemoryUsage, 表示記憶體使用量快照的MemoryUsage物件
java.lang.management.GarbageCollectorMXBean, 用於Java虛擬機器的垃圾回收的管理介面, 比如發生的回收的總次數, 和累計回收時間
java.lang.management.RuntimeMXBean, 用於Java 虛擬機器的執行時系統的管理介面

這個bolt的特點是, 只有prepare實現了邏輯, 並且通過_prepareWasCalled保證prepare只被執行一次 
prepare中的邏輯, 主要就是定義各種metric, 並且通過registerMetric註冊到TopologyContext中 
metic包含, JVM的執行時間, 開始時間, memory情況, 和每個GarbageCollector的情況 
註冊的這些system metrics也會一起被髮送到MetricsConsumerBolt進行處理 
這應該用spout實現, 為啥用bolt實現?

// There is one task inside one executor for each worker of the topology.
// TaskID is always -1, therefore you can only send-unanchored tuples to co-located SystemBolt.
// This bolt was conceived to export worker stats via metrics api.
public class SystemBolt implements IBolt {
    private static Logger LOG = LoggerFactory.getLogger(SystemBolt.class);
    private static boolean _prepareWasCalled = false;

    private static class MemoryUsageMetric implements IMetric {
        IFn _getUsage;
        public MemoryUsageMetric(IFn getUsage) {
            _getUsage = getUsage;
        }
        @Override
        public Object getValueAndReset() {
            MemoryUsage memUsage = (MemoryUsage)_getUsage.invoke();
            HashMap m = new HashMap();
            m.put("maxBytes", memUsage.getMax());
            m.put("committedBytes", memUsage.getCommitted());
            m.put("initBytes", memUsage.getInit());
            m.put("usedBytes", memUsage.getUsed());
            m.put("virtualFreeBytes", memUsage.getMax() - memUsage.getUsed());
            m.put("unusedBytes", memUsage.getCommitted() - memUsage.getUsed());
            return m;
        }
    }

    // canonically the metrics data exported is time bucketed when doing counts.
    // convert the absolute values here into time buckets.
    private static class GarbageCollectorMetric implements IMetric {
        GarbageCollectorMXBean _gcBean;
        Long _collectionCount;
        Long _collectionTime;
        public GarbageCollectorMetric(GarbageCollectorMXBean gcBean) {
            _gcBean = gcBean;
        }
        @Override
        public Object getValueAndReset() {
            Long collectionCountP = _gcBean.getCollectionCount();
            Long collectionTimeP = _gcBean.getCollectionTime();

            Map ret = null;
            if(_collectionCount!=null && _collectionTime!=null) {
                ret = new HashMap();
                ret.put("count", collectionCountP - _collectionCount);
                ret.put("timeMs", collectionTimeP - _collectionTime);
            }

            _collectionCount = collectionCountP;
            _collectionTime = collectionTimeP;
            return ret;
        }
    }

    @Override
    public void prepare(final Map stormConf, TopologyContext context, OutputCollector collector) {
        if(_prepareWasCalled && stormConf.get(Config.STORM_CLUSTER_MODE) != "local") {
            throw new RuntimeException("A single worker should have 1 SystemBolt instance.");
        }
        _prepareWasCalled = true;

        int bucketSize = RT.intCast(stormConf.get(Config.TOPOLOGY_BUILTIN_METRICS_BUCKET_SIZE_SECS));

        final RuntimeMXBean jvmRT = ManagementFactory.getRuntimeMXBean();

        context.registerMetric("uptimeSecs", new IMetric() {
            @Override
            public Object getValueAndReset() {
                return jvmRT.getUptime()/1000.0;
            }
        }, bucketSize);

        context.registerMetric("startTimeSecs", new IMetric() {
            @Override
            public Object getValueAndReset() {
                return jvmRT.getStartTime()/1000.0;
            }
        }, bucketSize);

        context.registerMetric("newWorkerEvent", new IMetric() {
            boolean doEvent = true;

            @Override
            public Object getValueAndReset() {
                if (doEvent) {
                    doEvent = false;
                    return 1;
                } else return 0;
            }
        }, bucketSize);

        final MemoryMXBean jvmMemRT = ManagementFactory.getMemoryMXBean();

        context.registerMetric("memory/heap", new MemoryUsageMetric(new AFn() {
            public Object invoke() {
                return jvmMemRT.getHeapMemoryUsage();
            }
        }), bucketSize);
        context.registerMetric("memory/nonHeap", new MemoryUsageMetric(new AFn() {
            public Object invoke() {
                return jvmMemRT.getNonHeapMemoryUsage();
            }
        }), bucketSize);

        for(GarbageCollectorMXBean b : ManagementFactory.getGarbageCollectorMXBeans()) {
            context.registerMetric("GC/" + b.getName().replaceAll("\W", ""), new GarbageCollectorMetric(b), bucketSize);
        }
    }

    @Override
    public void execute(Tuple input) {
        throw new RuntimeException("Non-system tuples should never be sent to __system bolt.");
    }

    @Override
    public void cleanup() {
    }
}

 

4. system-topology!

這裡會動態的往topology裡面, 加入metric-component (MetricsConsumerBolt) 和system-component (SystemBolt), 以及相應的steam資訊

system-topology!會往topology加上些東西

1. acker, 後面再說 
2. metric-bolt, input是所有component的tasks發來的METRICS-STREAM, 沒有output 
3. system-bolt, 沒有input, output是兩個TICK-STREAM 
4. 給所有component, 增加額外的輸出metrics-stream, system-stream

(defn system-topology! [storm-conf ^StormTopology topology]
  (validate-basic! topology)
  (let [ret (.deepCopy topology)]
    (add-acker! storm-conf ret)
    (add-metric-components! storm-conf ret)    
    (add-system-components! storm-conf ret)
    (add-metric-streams! ret)
    (add-system-streams! ret)
    (validate-structure! ret)
    ret
    ))

4.1 增加component

看下thrift中的定義, 往topology裡面增加一個blot component, 其實就是往hashmap中增加一組[string, Bolt] 
關鍵就是看看如何使用thrift/mk-bolt-spec*來建立blot spec

struct StormTopology {
  1: required map<string, SpoutSpec> spouts;
  2: required map<string, Bolt> bolts;
  3: required map<string, StateSpoutSpec> state_spouts;
}
struct Bolt {
  1: required ComponentObject bolt_object;
  2: required ComponentCommon common;
}  
struct ComponentCommon {
  1: required map<GlobalStreamId, Grouping> inputs;
  2: required map<string, StreamInfo> streams; //key is stream id, outputs
  3: optional i32 parallelism_hint; //how many threads across the cluster should be dedicated to this component
  4: optional string json_conf;
}
struct StreamInfo {
  1: required list<string> output_fields;
  2: required bool direct;
}


(defn add-metric-components! [storm-conf ^StormTopology topology]  
  (doseq [[comp-id bolt-spec] (metrics-consumer-bolt-specs storm-conf topology)] ;;從metrics-consumer-bolt-specs中可以看出該bolt會以METRICS-STREAM-ID為輸入, 且沒有輸出
    (.put_to_bolts topology comp-id bolt-spec)))

(defn add-system-components! [conf ^StormTopology topology]
  (let [system-bolt-spec (thrift/mk-bolt-spec*
                          {} ;;input為空, 沒有輸入
                          (SystemBolt.) ;;object
                          {SYSTEM-TICK-STREAM-ID (thrift/output-fields ["rate_secs"])
                           METRICS-TICK-STREAM-ID (thrift/output-fields ["interval"])} ;;output, 定義兩個output streams, 但程式碼中並沒有emit                         
                          :p 0
                          :conf {TOPOLOGY-TASKS 0})]
    (.put_to_bolts topology SYSTEM-COMPONENT-ID system-bolt-spec)))

 

metric-components

首先, topology裡面所有的component(包含system component), 都需要往metics-bolt傳送統計資料, 所以component-ids-that-emit-metrics就是all-components-ids+SYSTEM-COMPONENT-ID 
那麼對於任意一個comp, 都會對metics-bolt產生如下輸入, {[comp-id METRICS-STREAM-ID] :shuffle} (採用:suffle grouping方式)

然後, 用thrift/mk-bolt-spec*來定義建立bolt的fn, mk-bolt-spec

最後, 呼叫mk-bolt-spec來建立metics-bolt的spec, 參考上面的定義 
關鍵就是, 建立MetricsConsumerBolt物件, 需要從storm-conf裡面讀出, MetricsConsumer的實現類和引數 
這個bolt負責, 將從各個task接收到的資料, 呼叫handleDataPoints生成metircs, 參考前面的定義

(defn metrics-consumer-bolt-specs [storm-conf topology]
  (let [component-ids-that-emit-metrics (cons SYSTEM-COMPONENT-ID (keys (all-components topology)))
        inputs (->> (for [comp-id component-ids-that-emit-metrics]
                      {[comp-id METRICS-STREAM-ID] :shuffle})
                    (into {}))
        
        mk-bolt-spec (fn [class arg p]
                       (thrift/mk-bolt-spec*
                        inputs  ;;inputs集合
                        (backtype.storm.metric.MetricsConsumerBolt. class arg) ;;object
                        {} ;;output為空
:p p :conf {TOPOLOGY-TASKS p}))] (map (fn [component-id register] [component-id (mk-bolt-spec (get register "class") (get register "argument") (or (get register "parallelism.hint") 1))]) (metrics-consumer-register-ids storm-conf) (get storm-conf TOPOLOGY-METRICS-CONSUMER-REGISTER))))

 

4.2 增加stream

給每個component增加兩個output stream 
METRICS-STREAM-ID, 傳送給metric-blot, 資料結構為output-fields [“task-info” “data-points“] 
SYSTEM-STREAM-ID, ,資料結構為output-fields [“event“]

(defn add-metric-streams! [^StormTopology topology]
  (doseq [[_ component] (all-components topology)
          :let [common (.get_common component)]]
    (.put_to_streams common METRICS-STREAM-ID
                     (thrift/output-fields ["task-info" "data-points"]))))

(defn add-system-streams! [^StormTopology topology]
  (doseq [[_ component] (all-components topology)
          :let [common (.get_common component)]]
    (.put_to_streams common SYSTEM-STREAM-ID (thrift/output-fields ["event"]))))

本文章摘自部落格園,原文釋出日期:2013-07-30


相關文章