Storm-原始碼分析-TopologySubmit-Task-TopologyContext(backtype.storm.task)
1. GeneralTopologyContext
記錄了Topology的基本資訊, 包含StormTopology, StormConf
已經從他們推匯出的, task和component, component的streams, input/output資訊
public class GeneralTopologyContext implements JSONAware { private StormTopology _topology; private Map<Integer, String> _taskToComponent; private Map<String, List<Integer>> _componentToTasks; private Map<String, Map<String, Fields>> _componentToStreamToFields; //ComponentCommon.streams, map<string, StreamInfo> private String _stormId; ;;topology id protected Map _stormConf; }
StormTopology, worker從磁碟stormcode.ser
中讀出
struct StormTopology { //ids must be unique across maps // #workers to use is in conf 1: required map<string, SpoutSpec> spouts; 2: required map<string, Bolt> bolts; 3: required map<string, StateSpoutSpec> state_spouts; }
StormConf, worker從磁碟stormconf.ser
中讀出
taskToComponent, componentToTasks, task和component的對應關係
componentToStreamToFields, component包含哪些streams, 每個stream包含哪些fields
除了顯而易見的操作以外, 還有如下操作以獲得component的輸入和輸出
/** * Gets the declared inputs to the specified component. * * @return A map from subscribed component/stream to the grouping subscribed with. */ public Map<GlobalStreamId, Grouping> getSources(String componentId) { return getComponentCommon(componentId).get_inputs(); //ComponentCommon.inputs,map<GlobalStreamId, Grouping> }
/** * Gets information about who is consuming the outputs of the specified component, * and how. * * @return Map from stream id to component id to the Grouping used. */ public Map<String, Map<String, Grouping>> getTargets(String componentId) { Map<String, Map<String, Grouping>> ret = new HashMap<String, Map<String, Grouping>>(); for(String otherComponentId: getComponentIds()) { //對所有components的id Map<GlobalStreamId, Grouping> inputs = getComponentCommon(otherComponentId).get_inputs(); //取出component的inputs for(GlobalStreamId id: inputs.keySet()) { //對inputs裡面的每個stream-id if(id.get_componentId().equals(componentId)) { //判斷stream的源component是否是該component Map<String, Grouping> curr = ret.get(id.get_streamId()); if(curr==null) curr = new HashMap<String, Grouping>(); curr.put(otherComponentId, inputs.get(id)); ret.put(id.get_streamId(), curr); } } } return ret; // [steamid, [target-componentid, grouping]] }
這裡面的getComponentCommon和getComponentIds, 來自ThriftTopologyUtils類
不要誤解, 不是通過thriftAPI去nimbus獲取資訊, 只是從StormTopology裡面讀資訊, 而StormTopology類本身是generated by thrift
thrift產生的class, 是有metaDataMap的, 所以實現如下
public static Set<String> getComponentIds(StormTopology topology) { Set<String> ret = new HashSet<String>(); for(StormTopology._Fields f: StormTopology.metaDataMap.keySet()) { Map<String, Object> componentMap = (Map<String, Object>) topology.getFieldValue(f); ret.addAll(componentMap.keySet()); } return ret; }
通過metaDataMap讀出StormTopology裡面有哪些field, spouts,bolts,state_spouts, 然後遍歷getFieldValue, 將value中的keyset返回
這樣做的好處是, 動態, 當StormTopology發生變化時, 程式碼不用改, 對於普通java class應該無法實現這樣的功能, 但是對於python這樣的動態語言, 就簡單了
當然這裡其實也可以不用ThriftTopologyUtils, 直接寫死從StormTopology.spouts…中去讀
從storm.thrift裡面看看ComponentCommon的定義, 上面兩個函式就很好理解了
getTargets的實現, 需要看看, 因為是從inputs去推出outputs
因為在ComponentCommon只記錄了output的streamid以及fields, 但無法知道這個stream發往哪個component
但對於input, streamid是GlobalStreamId型別, GlobalStreamId裡面不但包含streamid,還有源component的componentid
所以從這個可以反推, 只要源component是當前component, 那麼說明該component是源component的target component
struct ComponentCommon { 1: required map<GlobalStreamId, Grouping> inputs; 2: required map<string, StreamInfo> streams; //key is stream id, outputs 3: optional i32 parallelism_hint; //how many threads across the cluster should be dedicated to this component 4: optional string json_conf; } struct SpoutSpec { 1: required ComponentObject spout_object; 2: required ComponentCommon common; // can force a spout to be non-distributed by overriding the component configuration // and setting TOPOLOGY_MAX_TASK_PARALLELISM to 1 } struct Bolt { 1: required ComponentObject bolt_object; 2: required ComponentCommon common; }
2. WorkerTopologyContext
WorkerTopologyContext封裝了些worker相關資訊
public class WorkerTopologyContext extends GeneralTopologyContext { public static final String SHARED_EXECUTOR = "executor"; private Integer _workerPort; ;;worker程式的port private List<Integer> _workerTasks; ;;worker包含的taskids private String _codeDir; ;;supervisor上的程式碼目錄, stormdist/stormid private String _pidDir; ;;記錄worker執行程式(可能多個)的pids的目錄,workid/pids Map<String, Object> _userResources; Map<String, Object> _defaultResources; }
3. TopologyContext
看註釋, TopologyContext會作為bolt和spout的prepare(or open)函式的引數
所以用openOrPrepareWasCalled, 表示該TopologyContext是否被prepare呼叫過
registerMetric, 可以用於往_registeredMetrics中註冊metics
註冊的結構, [timeBucketSizeInSecs, [taskId, [name, metric]]]
_hooks, 用於註冊task hook
/** * A TopologyContext is given to bolts and spouts in their "prepare" and "open" * methods, respectively. This object provides information about the component`s * place within the topology, such as task ids, inputs and outputs, etc. * * <p>The TopologyContext is also used to declare ISubscribedState objects to * synchronize state with StateSpouts this object is subscribed to.</p> */ public class TopologyContext extends WorkerTopologyContext implements IMetricsContext { private Integer _taskId; private Map<String, Object> _taskData = new HashMap<String, Object>(); private List<ITaskHook> _hooks = new ArrayList<ITaskHook>(); private Map<String, Object> _executorData; private Map<Integer,Map<Integer, Map<String, IMetric>>> _registeredMetrics; private clojure.lang.Atom _openOrPrepareWasCalled;
public TopologyContext(StormTopology topology, Map stormConf, Map<Integer, String> taskToComponent, Map<String, List<Integer>> componentToSortedTasks, Map<String, Map<String, Fields>> componentToStreamToFields, String stormId, String codeDir, String pidDir, Integer taskId, Integer workerPort, List<Integer> workerTasks, Map<String, Object> defaultResources, Map<String, Object> userResources, Map<String, Object> executorData, Map registeredMetrics, clojure.lang.Atom openOrPrepareWasCalled) { super(topology, stormConf, taskToComponent, componentToSortedTasks, componentToStreamToFields, stormId, codeDir, pidDir, workerPort, workerTasks, defaultResources, userResources); _taskId = taskId; _executorData = executorData; _registeredMetrics = registeredMetrics; _openOrPrepareWasCalled = openOrPrepareWasCalled; }
4. 使用
mk-task-data, 建立每個task的topology context
user-context (user-topology-context (:worker executor-data) executor-data task-id)
(defn user-topology-context [worker executor-data tid] ((mk-topology-context-builder worker executor-data (:topology worker)) tid)) (defn mk-topology-context-builder [worker executor-data topology] (let [conf (:conf worker)] #(TopologyContext. topology (:storm-conf worker) (:task->component worker) (:component->sorted-tasks worker) (:component->stream->fields worker) (:storm-id worker) (supervisor-storm-resources-path (supervisor-stormdist-root conf (:storm-id worker))) (worker-pids-root conf (:worker-id worker)) (int %) (:port worker) (:task-ids worker) (:default-shared-resources worker) (:user-shared-resources worker) (:shared-executor-data executor-data) (:interval->task->metric-registry executor-data) (:open-or-prepare-was-called? executor-data))))
本文章摘自部落格園,原文釋出日期:2013-07-26
相關文章
- 搞定storm-入門ORM
- Retrofit原始碼分析三 原始碼分析原始碼
- 【JDK原始碼分析系列】ArrayBlockingQueue原始碼分析JDK原始碼BloC
- 集合原始碼分析[2]-AbstractList 原始碼分析原始碼
- 集合原始碼分析[3]-ArrayList 原始碼分析原始碼
- 集合原始碼分析[1]-Collection 原始碼分析原始碼
- Android 原始碼分析之 AsyncTask 原始碼分析Android原始碼
- Guava 原始碼分析之 EventBus 原始碼分析Guava原始碼
- 以太坊原始碼分析(36)ethdb原始碼分析原始碼
- 以太坊原始碼分析(38)event原始碼分析原始碼
- 以太坊原始碼分析(41)hashimoto原始碼分析原始碼
- 以太坊原始碼分析(43)node原始碼分析原始碼
- 以太坊原始碼分析(51)rpc原始碼分析原始碼RPC
- 以太坊原始碼分析(52)trie原始碼分析原始碼
- 深度 Mybatis 3 原始碼分析(一)SqlSessionFactoryBuilder原始碼分析MyBatis原始碼SQLSessionUI
- k8s client-go原始碼分析 informer原始碼分析(4)-DeltaFIFO原始碼分析K8SclientGo原始碼ORM
- k8s client-go原始碼分析 informer原始碼分析(6)-Indexer原始碼分析K8SclientGo原始碼ORMIndex
- 5.2 spring5原始碼--spring AOP原始碼分析三---切面原始碼分析Spring原始碼
- Spring原始碼分析——搭建spring原始碼Spring原始碼
- 精盡MyBatis原始碼分析 - MyBatis-Spring 原始碼分析MyBatis原始碼Spring
- 以太坊原始碼分析(35)eth-fetcher原始碼分析原始碼
- 以太坊原始碼分析(20)core-bloombits原始碼分析原始碼OOM
- 以太坊原始碼分析(24)core-state原始碼分析原始碼
- 以太坊原始碼分析(29)core-vm原始碼分析原始碼
- 以太坊原始碼分析(34)eth-downloader原始碼分析原始碼
- k8s client-go原始碼分析 informer原始碼分析(5)-Controller&Processor原始碼分析K8SclientGo原始碼ORMController
- ddos原始碼分析原始碼
- SpringBoot原始碼分析Spring Boot原始碼
- ucontext原始碼分析Context原始碼
- jQuery原始碼分析jQuery原始碼
- Express原始碼分析Express原始碼
- Eureka原始碼分析原始碼
- AbstractQueuedSynchronizer原始碼分析原始碼
- unbound原始碼分析原始碼
- Mybatis原始碼分析MyBatis原始碼
- apparmor 原始碼分析APP原始碼
- hadoop原始碼分析Hadoop原始碼
- JsBridge原始碼分析JS原始碼
- ThreadPoolExecutor原始碼分析thread原始碼