Storm-原始碼分析-TopologySubmit-Task-TopologyContext(backtype.storm.task)

寒凝雪發表於2017-05-02

原文網址 : https://flycode.co/archives/191204

1. GeneralTopologyContext

記錄了Topology的基本資訊, 包含StormTopology, StormConf
已經從他們推匯出的, task和component, component的streams, input/output資訊

public class GeneralTopologyContext implements JSONAware {
    private StormTopology _topology; 
    private Map<Integer, String> _taskToComponent;
    private Map<String, List<Integer>> _componentToTasks;
    private Map<String, Map<String, Fields>> _componentToStreamToFields; //ComponentCommon.streams, map<string, StreamInfo>
    private String _stormId;   ;;topology id
    protected Map _stormConf;  

}

StormTopology, worker從磁碟stormcode.ser中讀出

struct StormTopology {
  //ids must be unique across maps
  // #workers to use is in conf
  1: required map<string, SpoutSpec> spouts;
  2: required map<string, Bolt> bolts;
  3: required map<string, StateSpoutSpec> state_spouts;
}

StormConf, worker從磁碟stormconf.ser中讀出

taskToComponent, componentToTasks, task和component的對應關係

componentToStreamToFields, component包含哪些streams, 每個stream包含哪些fields

除了顯而易見的操作以外, 還有如下操作以獲得component的輸入和輸出

    /**
     * Gets the declared inputs to the specified component.
     *
     * @return A map from subscribed component/stream to the grouping subscribed with.
     */
    public Map<GlobalStreamId, Grouping> getSources(String componentId) {
        return getComponentCommon(componentId).get_inputs();  //ComponentCommon.inputs,map<GlobalStreamId, Grouping>
    }

    /**
     * Gets information about who is consuming the outputs of the specified component,
     * and how.
     *
     * @return Map from stream id to component id to the Grouping used.
     */
    public Map<String, Map<String, Grouping>> getTargets(String componentId) {
        Map<String, Map<String, Grouping>> ret = new HashMap<String, Map<String, Grouping>>();
        for(String otherComponentId: getComponentIds()) {  //對所有components的id
            Map<GlobalStreamId, Grouping> inputs = getComponentCommon(otherComponentId).get_inputs();  //取出component的inputs
            for(GlobalStreamId id: inputs.keySet()) {  //對inputs裡面的每個stream-id
                if(id.get_componentId().equals(componentId)) {    //判斷stream的源component是否是該component
                    Map<String, Grouping> curr = ret.get(id.get_streamId());
                    if(curr==null) curr = new HashMap<String, Grouping>();
                    curr.put(otherComponentId, inputs.get(id));
                    ret.put(id.get_streamId(), curr);
                }
            }
        }
        return ret; // [steamid, [target-componentid, grouping]]
    }

這裡面的getComponentCommon和getComponentIds, 來自ThriftTopologyUtils類
不要誤解, 不是通過thriftAPI去nimbus獲取資訊, 只是從StormTopology裡面讀資訊, 而StormTopology類本身是generated by thrift
thrift產生的class, 是有metaDataMap的, 所以實現如下

    public static Set<String> getComponentIds(StormTopology topology) {
        Set<String> ret = new HashSet<String>();
        for(StormTopology._Fields f: StormTopology.metaDataMap.keySet()) {
            Map<String, Object> componentMap = (Map<String, Object>) topology.getFieldValue(f);
            ret.addAll(componentMap.keySet());
        }
        return ret;
    }

通過metaDataMap讀出StormTopology裡面有哪些field, spouts,bolts,state_spouts, 然後遍歷getFieldValue, 將value中的keyset返回
這樣做的好處是, 動態, 當StormTopology發生變化時, 程式碼不用改, 對於普通java class應該無法實現這樣的功能, 但是對於python這樣的動態語言, 就簡單了
當然這裡其實也可以不用ThriftTopologyUtils, 直接寫死從StormTopology.spouts…中去讀

從storm.thrift裡面看看ComponentCommon的定義, 上面兩個函式就很好理解了
getTargets的實現, 需要看看, 因為是從inputs去推出outputs
因為在ComponentCommon只記錄了output的streamid以及fields, 但無法知道這個stream發往哪個component
但對於input, streamid是GlobalStreamId型別, GlobalStreamId裡面不但包含streamid,還有源component的componentid
所以從這個可以反推, 只要源component是當前component, 那麼說明該component是源component的target component

struct ComponentCommon {
  1: required map<GlobalStreamId, Grouping> inputs;
  2: required map<string, StreamInfo> streams; //key is stream id, outputs
  3: optional i32 parallelism_hint; //how many threads across the cluster should be dedicated to this component
  4: optional string json_conf;
}

struct SpoutSpec {
  1: required ComponentObject spout_object;
  2: required ComponentCommon common;
  // can force a spout to be non-distributed by overriding the component configuration
  // and setting TOPOLOGY_MAX_TASK_PARALLELISM to 1
}

struct Bolt {
  1: required ComponentObject bolt_object;
  2: required ComponentCommon common;
}

2. WorkerTopologyContext

WorkerTopologyContext封裝了些worker相關資訊

public class WorkerTopologyContext extends GeneralTopologyContext {
    public static final String SHARED_EXECUTOR = "executor";
    
    private Integer _workerPort;         ;;worker程式的port
    private List<Integer> _workerTasks;  ;;worker包含的taskids
    private String _codeDir;             ;;supervisor上的程式碼目錄, stormdist/stormid
    private String _pidDir;              ;;記錄worker執行程式(可能多個)的pids的目錄,workid/pids 
    Map<String, Object> _userResources;
    Map<String, Object> _defaultResources;

}

3. TopologyContext

看註釋, TopologyContext會作為bolt和spout的prepare(or open)函式的引數
所以用openOrPrepareWasCalled, 表示該TopologyContext是否被prepare呼叫過

registerMetric, 可以用於往_registeredMetrics中註冊metics
註冊的結構, [timeBucketSizeInSecs, [taskId, [name, metric]]]

_hooks, 用於註冊task hook

/**
 * A TopologyContext is given to bolts and spouts in their "prepare" and "open"
 * methods, respectively. This object provides information about the component`s
 * place within the topology, such as task ids, inputs and outputs, etc.
 *
 * <p>The TopologyContext is also used to declare ISubscribedState objects to
 * synchronize state with StateSpouts this object is subscribed to.</p>
 */
public class TopologyContext extends WorkerTopologyContext implements IMetricsContext {
    private Integer _taskId;
    private Map<String, Object> _taskData = new HashMap<String, Object>();
    private List<ITaskHook> _hooks = new ArrayList<ITaskHook>();
    private Map<String, Object> _executorData;
    private Map<Integer,Map<Integer, Map<String, IMetric>>> _registeredMetrics;
    private clojure.lang.Atom _openOrPrepareWasCalled;

    public TopologyContext(StormTopology topology, Map stormConf,
            Map<Integer, String> taskToComponent, Map<String, List<Integer>> componentToSortedTasks,
            Map<String, Map<String, Fields>> componentToStreamToFields,
            String stormId, String codeDir, String pidDir, Integer taskId,
            Integer workerPort, List<Integer> workerTasks, Map<String, Object> defaultResources,
            Map<String, Object> userResources, Map<String, Object> executorData, Map registeredMetrics,
            clojure.lang.Atom openOrPrepareWasCalled) {
        super(topology, stormConf, taskToComponent, componentToSortedTasks,
                componentToStreamToFields, stormId, codeDir, pidDir,
                workerPort, workerTasks, defaultResources, userResources);
        _taskId = taskId;
        _executorData = executorData;
        _registeredMetrics = registeredMetrics;
        _openOrPrepareWasCalled = openOrPrepareWasCalled;
    }

4. 使用

mk-task-data, 建立每個task的topology context

user-context (user-topology-context (:worker executor-data) executor-data task-id)

(defn user-topology-context [worker executor-data tid]
  ((mk-topology-context-builder
    worker
    executor-data
    (:topology worker))
   tid))

(defn mk-topology-context-builder [worker executor-data topology]
  (let [conf (:conf worker)]
    #(TopologyContext.
      topology
      (:storm-conf worker)
      (:task->component worker)
      (:component->sorted-tasks worker)
      (:component->stream->fields worker)
      (:storm-id worker)
      (supervisor-storm-resources-path
        (supervisor-stormdist-root conf (:storm-id worker)))
      (worker-pids-root conf (:worker-id worker))
      (int %)
      (:port worker)
      (:task-ids worker)
      (:default-shared-resources worker)
      (:user-shared-resources worker)
      (:shared-executor-data executor-data)
      (:interval->task->metric-registry executor-data)
      (:open-or-prepare-was-called? executor-data))))

本文章摘自部落格園，原文釋出日期：2013-07-26

Retrofit原始碼分析三原始碼分析
2018-05-17
原始碼
搞定storm-入門
2020-12-10
ORM
集合原始碼分析[2]-AbstractList 原始碼分析
2019-04-11
原始碼
集合原始碼分析[3]-ArrayList 原始碼分析
2019-04-12
原始碼
Guava 原始碼分析之 EventBus 原始碼分析
2018-08-01
Guava原始碼
【JDK原始碼分析系列】ArrayBlockingQueue原始碼分析
2020-09-27
JDK原始碼BloC
集合原始碼分析[1]-Collection 原始碼分析
2019-03-23
原始碼
Android 原始碼分析之 AsyncTask 原始碼分析
2019-03-04
Android原始碼
以太坊原始碼分析(36)ethdb原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(38）event原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(41）hashimoto原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(43）node原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(51）rpc原始碼分析
2018-05-14
原始碼RPC
以太坊原始碼分析(52）trie原始碼分析
2018-05-14
原始碼
深度 Mybatis 3 原始碼分析（一）SqlSessionFactoryBuilder原始碼分析
2019-06-06
MyBatis原始碼SQLSessionUI
k8s client-go原始碼分析 informer原始碼分析(6)-Indexer原始碼分析
2022-06-19
K8SclientGo原始碼ORMIndex
k8s client-go原始碼分析 informer原始碼分析(4)-DeltaFIFO原始碼分析
2022-05-22
K8SclientGo原始碼ORM
5.2 spring5原始碼--spring AOP原始碼分析三---切面原始碼分析
2021-02-11
Spring原始碼
Spring原始碼分析——搭建spring原始碼
2021-09-12
Spring原始碼
以太坊原始碼分析(35)eth-fetcher原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(20)core-bloombits原始碼分析
2018-05-14
原始碼OOM
以太坊原始碼分析(24)core-state原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(29)core-vm原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(34)eth-downloader原始碼分析
2018-05-14
原始碼
精盡MyBatis原始碼分析 - MyBatis-Spring 原始碼分析
2020-11-27
MyBatis原始碼Spring
k8s client-go原始碼分析 informer原始碼分析(5)-Controller&Processor原始碼分析
2022-06-05
K8SclientGo原始碼ORMController
SocketServer 原始碼分析
2019-02-16
Server原始碼
React 原始碼分析
2018-10-25
React原始碼
Dialog原始碼分析
2018-10-17
原始碼
Axios原始碼分析
2019-01-13
iOS原始碼
[原始碼分析]ArrayList
2019-02-22
原始碼
CAS原始碼分析
2019-03-04
原始碼
preact原始碼分析
2019-04-07
React原始碼
httprouter 原始碼分析
2019-04-17
HTTP原始碼
retrofit 原始碼分析
2019-01-19
原始碼
LeakCanary 原始碼分析
2018-12-11
原始碼
redux原始碼分析
2019-03-01
Redux原始碼
Hyperapp原始碼分析
2019-02-28
APP原始碼
DialogFragment原始碼分析
2018-10-12
Fragment原始碼

Storm-原始碼分析-TopologySubmit-Task-TopologyContext(backtype.storm.task)

1. GeneralTopologyContext

2. WorkerTopologyContext

3. TopologyContext

4. 使用

相關文章