對於ambari-collector 部分原始碼流程的簡單理解

weixin_34148340發表於2018-10-23

工作中遇到了ambari指標的蒐集和展示問題，因需要增添部分python指令碼對平臺的指令碼資料進行匯聚和整理，所以需要理解ambari-collector的部分流程，所以進行了簡要的閱讀，故做此分析，以防止後續遺忘。以下程式碼都存在刪減，僅供參考。

首先來看一下main.py

from core.controller import Controller
def main(argv=None):
  # Allow Ctrl-C
  stop_handler = bind_signal_handlers()
  server_process_main(stop_handler)
def server_process_main(stop_handler, scmStatus=None):
  if scmStatus is not None:
    scmStatus.reportStartPending()
  config = Configuration()
  _init_logging(config)
  controller = Controller(config, stop_handler)
  logger.info('Starting Server RPC Thread: %s' % ' '.join(sys.argv))
  controller.start()
  print "Server out at: " + SERVER_OUT_FILE
  print "Server log at: " + SERVER_LOG_FILE
  save_pid(os.getpid(), PID_OUT_FILE)
  if scmStatus is not None:
    scmStatus.reportStarted()
  #The controller thread finishes when the stop event is signaled
  controller.join()
  remove_file(PID_OUT_FILE)
  pass

由上述程式碼可以看出來，server_process_main 裡面例項化了Controller類，並controller.start()開啟了一個執行緒，那麼我們看一下Controller裡面的的程式碼（有刪略）

from emitter import Emitter
class Controller(threading.Thread):
  def __init__(self, config, stop_handler):
    # Process initialization code
    threading.Thread.__init__(self)
    self.emitter = Emitter(self.config, self.application_metric_map, stop_handler)
  def run(self):
    self.start_emitter()
    Timer(1, self.addsecond).start()
    while True:
      if (self.event_queue.full()):
        logger.warn('Event Queue full!! Suspending further collections.')
      else:
        self.enqueque_events()
      pass
      if 0 == self._stop_handler.wait(self.sleep_interval):
        logger.info('Shutting down Controller thread')
        break
      if not self._t is None:
        self._t.cancel()
        self._t.join(5)
     self.emitter.join(5)
     pass
  def start_emitter(self):
    self.emitter.start()

run函式裡面執行了start_emitter()，然後對Emitter進行例項化，執行了emmitter.start()，接下來我們看一下Emitter的程式碼

class Emitter(threading.Thread):
  COLLECTOR_URL = "xxxxx"
  RETRY_SLEEP_INTERVAL = 5
  MAX_RETRY_COUNT = 3
  def __init__(self, config, application_metric_map, stop_handler):
    threading.Thread.__init__(self)
    self.lock = threading.Lock()
    self.collector_address = config.get_server_address()
    self.send_interval = config.get_send_interval()
    self._stop_handler = stop_handler
    self.application_metric_map = application_metric_map
  def run(self):
    while True:
      try:
        self.submit_metrics()
      except Exception, e:
        logger.warn('Unable to emit events. %s' % str(e))
      pass
      if 0 == self._stop_handler.wait(self.send_interval):
        logger.info('Shutting down Emitter thread')
        return
    pass
def submit_metrics(self):
    retry_count = 0
    # This call will acquire lock on the map and clear contents before returning
    # After configured number of retries the data will not be sent to the
    # collector
    json_data = self.application_metric_map.flatten(None, True)
    if json_data is None:
      logger.info("Nothing to emit, resume waiting.")
      return
    pass
    response = None
    while retry_count < self.MAX_RETRY_COUNT:
      try:
        response = self.push_metrics(json_data)
      except Exception, e:
        logger.warn('Error sending metrics to server. %s' % str(e))
      pass
      if response and response.getcode() == 200:
        retry_count = self.MAX_RETRY_COUNT
      else:
        logger.warn("Retrying after {0} ...".format(self.RETRY_SLEEP_INTERVAL))
        retry_count += 1
        #Wait for the service stop event instead of sleeping blindly
        if 0 == self._stop_handler.wait(self.RETRY_SLEEP_INTERVAL):
          return
      pass
    pass
  def push_metrics(self, data):
    headers = {"Content-Type" : "application/json", "Accept" : "*/*"}
    server = self.COLLECTOR_URL.format(self.collector_address.strip())
    logger.info("server: %s" % server)
    logger.debug("message to sent: %s" % data)
    req = urllib2.Request(server, data, headers)
    response = urllib2.urlopen(req, timeout=int(self.send_interval - 10))
    if response:
      logger.debug("POST response from server: retcode = {0}".format(response.getcode()))
      logger.debug(str(response.read()))
    pass
    return response

由上述程式碼可以看出來，run函式執行的時候，執行了submit_metrics()函式，重點來了，該函式的核心就是 json_data = self.application_metric_map.flatten(None, True),當前類繼承自ApplicationMetricsMap，讓我們去檢視一下ApplicationMetricsMap的程式碼

def flatten(self, application_id = None, clear_once_flattened = False):
    with self.lock:
      timeline_metrics = { "metrics" : [] }
      local_metric_map = {}
      if application_id:
        if self.app_metric_map.has_key(application_id):
          local_metric_map = { application_id : self.app_metric_map[application_id] }
        else:
          logger.info("application_id: {0}, not present in the map.".format(application_id))
      else:
        local_metric_map = self.app_metric_map.copy()
      pass
      for appId, metrics in local_metric_map.iteritems():
        for metricId, metricData in dict(metrics).iteritems():
          # Create a timeline metric object
          timeline_metric = {
            "hostname" : self.hostname if appId == "HOST" else "",
            "metricname" : metricId,
            #"appid" : "HOST",
            "appid" : appId,
            "instanceid" : "",
            "starttime" : self.get_start_time(appId, metricId),
            "metrics" : metricData
          }
          timeline_metrics[ "metrics" ].append( timeline_metric )
        pass
      pass
      json_data = json.dumps(timeline_metrics) if len(timeline_metrics[ "metrics" ]) > 0 else None
      if clear_once_flattened:
        self.app_metric_map.clear()
      pass
      return json_data
  pass

由此函式可以看得出來，該函式主要就是對資料進行一些合併，匯聚形成新的資料結構，但是當第一次在Controller裡面執行start_emmiter()時候，該函式並未執行，因為self.app_metric_map的資料結構並未生成，讓我們往前看，在Controller的run函式裡面有這麼一行程式碼，self.enqueue_events()，從字面意思看出來是事件入佇列，讓我們找到該函式，最終進行相互呼叫後是執行了process_service_collection_event

 def process_service_collection_event(self, event):
    startTime = int(round(time() * 1000))
    metrics = None
    path = os.path.abspath('.')
    for root, dirs, files in os.walk("%s/libs/" % path):
      appid = event.get_group_name().split('_')[0]
      metricgroup = event.get_group_name().split('_')[1]
      if ("%s_metrics.sh" % appid) in filter(lambda x: ".sh" in x, files):
        metrics = {appid: self.service_info.get_service_metrics(appid, metricgroup)}
      else:
        logger.warn('have no %s modules' % appid)
    if metrics:
      for item in metrics:
        self.application_metric_map.put_metric(item, metrics[item], startTime)
    pass

這段程式碼就是執行各個服務的指令碼，然後匯聚資料，最終生成metrics變數，然後執行了self.application_metric_map.put_metric(item, metrics[item], startTime)，這個application_metric_map其實就是ApplicationMetricMap類的例項，其中有一個函式如下所示：

 def put_metric(self, application_id, metric_id_to_value_map, timestamp):
    with self.lock:
      for metric_name, value in metric_id_to_value_map.iteritems():
        metric_map = self.app_metric_map.get(application_id)
        if not metric_map:
          metric_map = { metric_name : { timestamp : value } }
          self.app_metric_map[ application_id ] = metric_map
        else:
          metric_id_map = metric_map.get(metric_name)
          if not metric_id_map:
            metric_id_map = { timestamp : value }
            metric_map[ metric_name ] = metric_id_map
          else:
            metric_map[ metric_name ].update( { timestamp : value } )
          pass
        pass
  pass

其實這段程式碼主要是從指令碼中搜集的資料，形成最終的app_metric_map資料，在Controller中一直被無限呼叫，只是我們第一次執行start_emitter（）時候並未執行而已，當從指令碼中搜集到資料，才會執行真正的呼叫，然後通過requests模組，上報到 metrics collector的6188埠中，最終資料落於hbase中。

對於Redux原始碼的一些理解
2019-03-27
Redux原始碼
對CSRF的簡單理解
2020-11-30
基於個人理解的springAOP部分原始碼分析，內含較多原始碼，慎入
2020-07-29
Spring原始碼
簡單理解DNS解析流程（一）
2019-04-05
DNS
關於BFC的簡單理解
2019-04-05
關於RabbitMQ的簡單理解
2021-02-04
MQ
第 23 期 Drone 簡單介紹和部分原始碼分析
2020-02-13
原始碼
談一談對vuex的簡單理解
2018-06-21
Vue
對CAS演算法的簡單理解
2018-03-18
演算法
關於JDK15的簡單理解
2021-01-20
JDK
我對微服務架構的簡單理解
2024-05-25
微服務架構
關於Java註解（annotation）的簡單理解
2021-02-05
Java
關於MongoDB的簡單理解（二）--Java篇
2021-01-30
MongoDBJava
簡單的Git流程
2018-07-09
Git
談一談對vue-router的簡單理解
2018-06-23
Vue
MediaScanner原始碼簡單分析
2019-03-04
原始碼
關於一對一軟體如何搭建PHP直播系統原始碼的流程
2021-07-14
PHP原始碼
關於MongoDB的簡單理解（一）--基礎篇
2021-01-29
MongoDB
基於vue實現一個簡單的MVVM框架（原始碼分析）
2018-08-04
VueMVVM框架原始碼
ThreadLocal的簡單理解
2022-06-28
thread
建立最簡單的物件（c 原始碼）
2019-11-04
物件原始碼
關於 Spring 中 getBean 的全流程原始碼解析
2021-04-19
SpringBean原始碼
關於對健壯性程式碼的理解
2020-06-06
ElasticSearch學習筆記(二)——對聚合的簡單理解
2018-06-17
Elasticsearch筆記
對於JS模組的簡單瞭解
2019-02-16
JS
對於BFC的理解
2019-04-16
對於MVVM的理解
2018-08-11
MVVM
簡述對Vuex的理解
2018-11-19
Vue
一對一直播原始碼，實現一個簡單的登入介面
2022-03-16
原始碼
Java的簡單理解(2)
2018-03-09
Java
PHP socket 的簡單理解
2019-09-13
PHP
MongoDB索引的簡單理解
2021-09-14
MongoDB索引
快速排序的簡單理解
2022-06-23
排序
Java 8 ArrayList 原始碼簡單分析
2020-03-09
Java原始碼
LayoutInflate部分原始碼解析
2018-04-03
原始碼
ThreadLocal部分原始碼分析
2021-10-24
thread原始碼
AbstractQueuedSynchronizer部分原始碼解析
2021-01-05
原始碼
簡單理解promise
2019-02-25
Promise

對於ambari-collector 部分原始碼流程的簡單理解

相關文章