[原始碼分析]並行分散式任務佇列 Celery 之 子程式處理訊息

羅西的思考發表於2021-04-25

[原始碼分析]並行分散式任務佇列 Celery 之 子程式處理訊息

0x00 摘要

Celery是一個簡單、靈活且可靠的,處理大量訊息的分散式系統,專注於實時處理的非同步任務佇列,同時也支援任務排程。在前文中,我們介紹了Celery 多執行緒模型,本文介紹子程式如何處理訊息。

通過本文,大家可以梳理如下流程:

  • 父程式如何傳送訊息給子程式;
  • 子程式如何接受到父程式訊息;
  • 子程式如何一步一步解析訊息,從而把執行任務需要的各種資訊一層一層剝離出來;
  • 子程式在得到任務資訊後,如何執行任務;
  • 為什麼 Celery 要有各種複雜繁瑣的封裝?

0x01 來由

我們首先回顧前文。之前 Celery work 中有 apply_async 函式呼叫到Pool,就是有使用者的任務訊息來到時,Celery 準備呼叫到 Pool

def apply_async(self, func, args=(), kwds={},...):           
        if self.threads:
            self._taskqueue.put(([(TASK, (result._job, None,
                                func, args, kwds))], None))
        else:
            self._quick_put((TASK, (result._job, None, func, args, kwds)))
        return result

然後,在 billiard/pool.py 這裡可以見到,Pool 會 以self._taskqueue做為媒介,把訊息傳遞到 TaskHandler 之中,進而將會呼叫到子程式。

class Pool(object):
    '''
    Class which supports an async version of applying functions to arguments.
    '''
    Worker = Worker
    Supervisor = Supervisor
    TaskHandler = TaskHandler
    TimeoutHandler = TimeoutHandler
    ResultHandler = ResultHandler

    def __init__(self, processes=None, initializer=None, initargs=(),...):

        self._task_handler = self.TaskHandler(self._taskqueue,
                                              self._quick_put,
                                              self._outqueue,
                                              self._pool,
                                              self._cache)
        if threads:
            self._task_handler.start()

此時邏輯如上文圖例所示:

                           +
    Consumer               |
                   message |
                           v         strategy  +------------------------------------+
              +------------+------+            | strategies                         |
              | on_task_received  | <--------+ |                                    |
              |                   |            |[myTest.add : task_message_handler] |
              +------------+------+            +------------------------------------+
                           |
                           |
   +------------------------------------------------------------------------------------+
   strategy                |
                           |
                           |
                           v                Request [myTest.add]
              +------------+-------------+                       +---------------------+
              | task_message_handler     | <-------------------+ | create_request_cls  |
              |                          |                       |                     |
              +------------+-------------+                       +---------------------+
                           | _process_task_sem
                           |
  +------------------------------------------------------------------------------------+
   Worker                  | req[{Request} myTest.add]
                           v
                  +--------+-----------+
                  | WorkController     |
                  |                    |
                  |            pool +-------------------------+
                  +--------+-----------+                      |
                           |                                  |
                           |               apply_async        v
               +-----------+----------+                   +---+-------------------+
               |{Request} myTest.add  | +---------------> | TaskPool              |
               +----------------------+                   +----+------------------+
                                          myTest.add           |
                                                               |
+--------------------------------------------------------------------------------------+
                                                               |
                                                               v
                                                          +----+------------------+
                                                          | billiard.pool.Pool    |
                                                          +-------+---------------+
                                                                  |
                                                                  |
 Pool              +---------------------------+                  |
                   | TaskHandler               |                  |
                   |                           |                  |  self._taskqueue.put
                   |              _taskqueue   |  <---------------+
                   |                           |
                   +------------+--------------+
                                |
                                |  put(task)
                                |
+--------------------------------------------------------------------------------------+
                                |
 Sub process                    |
                                v
                            self._inqueue                       

手機如下:

於是我們順著 taskqueue 就來到了TaskHandler。

0x02 父程式 TaskHandler

本部分介紹父程式如何傳遞 任務訊息 給 子程式。

此時依然是父程式。程式碼位置是:\billiard\pool.py。具體堆疊為:

_send_bytes, connection.py:314
send, connection.py:233
body, pool.py:596
run, pool.py:504
_bootstrap_inner, threading.py:926
_bootstrap, threading.py:890

變數為:

self = {TaskHandler} <TaskHandler(Thread-16, started daemon 14980)>
 additional_info = {PyDBAdditionalThreadInfo} State:2 Stop:None Cmd: 107 Kill:False
 cache = {dict: 1} {0: <%s: 0 ack:False ready:False>}
 daemon = {bool} True
 name = {str} 'Thread-16'
 outqueue = {SimpleQueue} <billiard.queues.SimpleQueue object at 0x000001E2C07DD6C8>
 pool = {list: 8} [<SpawnProcess(SpawnPoolWorker-1, started daemon)>, <SpawnProcess(SpawnPoolWorker-2, started daemon)>, <SpawnProcess(SpawnPoolWorker-3, started daemon)>, <SpawnProcess(SpawnPoolWorker-4, started daemon)>, <SpawnProcess(SpawnPoolWorker-5, started daemon)>, <SpawnProcess(SpawnPoolWorker-6, started daemon)>, <SpawnProcess(SpawnPoolWorker-7, started daemon)>, <SpawnProcess(SpawnPoolWorker-8, started daemon)>]
 taskqueue = {Queue} <queue.Queue object at 0x000001E2C07DD208>
  _args = {tuple: 0} ()
  _children = {WeakKeyDictionary: 0} <WeakKeyDictionary at 0x1e2c0883448>
  _daemonic = {bool} True
  _kwargs = {dict: 0} {}
  _name = {str} 'Thread-16'
  _parent = {_MainThread} <_MainThread(MainThread, started 13408)>
  _pid = {NoneType} None
  _start_called = {bool} True
  _started = {Event} <threading.Event object at 0x000001E2C0883D88>
  _state = {int} 0
  _stderr = {LoggingProxy} <celery.utils.log.LoggingProxy object at 0x000001E2C07DD188>
  _target = {NoneType} None
  _tstate_lock = {lock} <locked _thread.lock object at 0x000001E2C081FDB0>
  _was_started = {bool} True

2.1 傳送訊息

當父程式接受到任務訊息之後,就呼叫 put(task) 給在 父程式 和 子程式 之間的管道發訊息。

注意,因為之前的賦值程式碼是:

self._taskqueue = Queue()

def _setup_queues(self):
        self._inqueue = Queue()
        self._outqueue = Queue()
        self._quick_put = self._inqueue.put
        self._quick_get = self._outqueue.get

就是說,TaskHandler 內部,如果接到訊息,就 通過 self._inqueue.put 這個管道的函式 給 自己的 子程式發訊息。 self._taskqueue 就是一箇中間變數媒介而已

所以此時變數如下:

put = {method} <bound method _ConnectionBase.send of <billiard.connection.PipeConnection object at 0x000001E2C07DD2C8>>

self = {TaskHandler} <TaskHandler(Thread-16, started daemon 14980)>

task = {tuple: 2} 
 0 = {int} 2
 1 = {tuple: 5} (0, None, <function _trace_task_ret at 0x000001E2BFCA3438>, ('myTest.add', 'dee72291-5614-4106-a7bf-007023286e9e', {'lang': 'py', 'task': 'myTest.add', 'id': 'dee72291-5614-4106-a7bf-007023286e9e', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'dee72291-5614-4106-a7bf-007023286e9e', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen17456@DESKTOP-0GO3RPO', 'reply_to': '21660796-c7e7-3736-9d42-e1be6ff7eaa8', 'correlation_id': 'dee72291-5614-4106-a7bf-007023286e9e', 'hostname': 'celery@DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {})
 __len__ = {int} 2
    
taskqueue = {Queue} <queue.Queue object at 0x000001E2C07DD208>

具體程式碼如下,可以看到就是給管道發訊息,並且通知 result handler 和 其他worker

class TaskHandler(PoolThread):

    def __init__(self, taskqueue, put, outqueue, pool, cache):
        self.taskqueue = taskqueue
        self.put = put
        self.outqueue = outqueue
        self.pool = pool
        self.cache = cache
        super(TaskHandler, self).__init__()

    def body(self):
        cache = self.cache
        taskqueue = self.taskqueue
        put = self.put

        for taskseq, set_length in iter(taskqueue.get, None):
            task = None
            i = -1
            try:
                for i, task in enumerate(taskseq):
                    try:
                        put(task)

                break


        self.tell_others()

2.2 通知其他

tell_others 的作用是通知 result handler, 以及其他 worker。

def tell_others(self):
    outqueue = self.outqueue
    put = self.put
    pool = self.pool

    try:
        # tell result handler to finish when cache is empty
        outqueue.put(None)

        # tell workers there is no more work
        for p in pool:
            put(None)

0x03 子程式 worker

本部分介紹 Worker 子程式 如何接受任務,並且執行任務。

既然任務訊息已經通過管道傳送給子程式,現在執行來到了 子程式,注意此時 self 是 billiard.pool.Worker。

3.1 子程式 loop

在worker中,訊息 loop 具體邏輯(多次解析訊息)是:

  • 呼叫 wait_for_job 來等待父程式寫入管道的訊息;
  • 得到了使用者訊息 req 之後,解析出來 :type_, args = req
  • 如果需要傳送 ACK,就傳送;
  • 對於解析出來的 args,再次解析:job, i, fun, args, kwargs = args_,得到 job,子程式需要執行的函式,函式的引數等等;
  • 如果需要 wait_for_syn ,就處理;
  • 通過 fun 來 間接呼叫使用者自定義函式 result = (True, prepare_result( fun(*args, **kwargs))),並且返回result。需要注意的是,這裡的 fun 是 _trace_task_ret,使用者自定的函式由 _trace_task_ret 內部呼叫;
  • 進行後續處理,比如給父程式傳送 READY;

程式碼如下:

def workloop(self, debug=debug, now=monotonic, pid=None):
    pid = pid or os.getpid()
    put = self.outq.put
    inqW_fd = self.inqW_fd
    synqW_fd = self.synqW_fd
    maxtasks = self.maxtasks
    prepare_result = self.prepare_result

    wait_for_job = self.wait_for_job
    _wait_for_syn = self.wait_for_syn

    def wait_for_syn(jid):
        i = 0
        while 1:
            if i > 60:
                error('!!!WAIT FOR ACK TIMEOUT: job:%r fd:%r!!!',
                      jid, self.synq._reader.fileno(), exc_info=1)
            req = _wait_for_syn()
            if req:
                type_, args = req # 解析使用者傳遞來的訊息 req
                if type_ == NACK:
                    return False
                assert type_ == ACK
                return True
            i += 1

    completed = 0
    try:
        while maxtasks is None or (maxtasks and completed < maxtasks):
            req = wait_for_job()
            if req:
                type_, args_ = req
                assert type_ == TASK
                job, i, fun, args, kwargs = args_ # 再次解析,得到變數。這裡的 fun 是 `_trace_task_ret`,使用者自定的函式由 `_trace_task_ret` 內部呼叫
                put((ACK, (job, i, now(), pid, synqW_fd)))
                if _wait_for_syn:
                    confirm = wait_for_syn(job)
                    if not confirm:
                        continue  # received NACK

                    result = (True, prepare_result(fun(*args, **kwargs)))
 
                    put((READY, (job, i, result, inqW_fd)))

                completed += 1
                if max_memory_per_child > 0:
                    used_kb = mem_rss()
                    if used_kb > 0 and used_kb > max_memory_per_child:
                        warning(MAXMEM_USED_FMT.format(
                            used_kb, max_memory_per_child))
                        return EX_RECYCLE

        if maxtasks:
            return EX_RECYCLE if completed == maxtasks else EX_FAILURE
        return EX_OK
    finally:
        self._ensure_messages_consumed(completed=completed)

此時變數如下,req 變數就是父程式通過管道傳過來的訊息,子程式初步會解析成 args_

prepare_result = {method} <bound method Worker.prepare_result of <billiard.pool.Worker object at 0x000001BFAE5AE308>>
    
put = {method} <bound method _SimpleQueue.put of <billiard.queues.SimpleQueue object at 0x000001BFAE1BE7C8>>
    
type_ = 2 // 在 pool.py中有定義 TASK = 2
  
req = {tuple: 2} (2, (6, None, <function _trace_task_ret at 0x000001BFAE53EA68>, ('myTest.add', '2c6d431f-a86a-4972-886b-472662401d20', {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen14656@DESKTOP-0GO3RPO', 'reply_to': '3c9cc3a7-65d6-349b-ba66-399dc47b7cad', 'correlation_id': '2c6d431f-a86a-4972-886b-472662401d20', 'hostname': 'DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}, 'is_eager': False, 'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {}))

self = {Worker} <billiard.pool.Worker object at 0x000001BFAE5AE308>
    
kwargs = {dict: 0} {}

args_ = (6, None, <function _trace_task_ret at 0x000001BFAE53EA68>, ('myTest.add', '2c6d431f-a86a-4972-886b-472662401d20', {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen14656@DESKTOP-0GO3RPO', 'reply_to': '3c9cc3a7-65d6-349b-ba66-399dc47b7cad', 'correlation_id': '2c6d431f-a86a-4972-886b-472662401d20', 'hostname': 'DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}, 'is_eager': False, 'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {}))

對於前面的邏輯圖,我們往下擴充套件邏輯如下:

                                                               +
                                                               |
                                                               |
                                                               v
                                                          +----+------------------+
                                                          | billiard.pool.Pool    |
                                                          +-------+---------------+
                                                                  |
                                                                  |
 Pool              +---------------------------+                  |
                   | TaskHandler               |                  |
                   |                           |                  |  self._taskqueue.put
                   |              _taskqueue   |  <---------------+
                   |                           |
                   +------------+--------------+
                                |
                                |  put(task)
                                |
+--------------------------------------------------------------------------------------+
                                |
 billiard.pool.Worker           |  get                             Sub process
                                v
                     +----------+-----------------------------+
                     |  workloop                              |
                     |                                        |
                     |                                        |
                     |          wait_for_job                  |
                     |                                        |
                     +----------------------------------------+

手機如下:

3.2 得到父程式訊息

wait_for_job 函式最終輾轉呼叫到了_make_recv_method,就是使用管道 conn 的 讀取函式來處理。

讀取到的就是從父程式傳遞過來的訊息 req,具體見前面。

回顧父程式的寫入訊息內容

put = {method} <bound method _ConnectionBase.send of <billiard.connection.PipeConnection object at 0x000001E2C07DD2C8>>

self = {TaskHandler} <TaskHandler(Thread-16, started daemon 14980)>

task = {tuple: 2} 
 0 = {int} 2
 1 = {tuple: 5} (0, None, <function _trace_task_ret at 0x000001E2BFCA3438>, ('myTest.add', 'dee72291-5614-4106-a7bf-007023286e9e', {'lang': 'py', 'task': 'myTest.add', 'id': 'dee72291-5614-4106-a7bf-007023286e9e', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'dee72291-5614-4106-a7bf-007023286e9e', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen17456@DESKTOP-0GO3RPO', 'reply_to': '21660796-c7e7-3736-9d42-e1be6ff7eaa8', 'correlation_id': 'dee72291-5614-4106-a7bf-007023286e9e', 'hostname': 'celery@DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {})
 __len__ = {int} 2

可以看到,父程式寫入的內容在子程式被讀取出來。具體 子程式是通過 _make_recv_method來讀取訊息,就是使用管道 conn 的 讀取函式來處理

這裡是子程式了。

    def _make_recv_method(self, conn):
        get = conn.get

        if hasattr(conn, '_reader'):
            _poll = conn._reader.poll
            if hasattr(conn, 'get_payload') and conn.get_payload:
                get_payload = conn.get_payload

                def _recv(timeout, loads=pickle_loads):
                    return True, loads(get_payload())
            else:
                def _recv(timeout):  # noqa
                    if _poll(timeout):
                        return True, get()
                    return False, None
        else:
            def _recv(timeout):  # noqa
                try:
                    return True, get(timeout=timeout)
                except Queue.Empty:
                    return False, None
        return _recv

3.3 解析訊息

子程式讀取訊息之後,進行解析。job, i, fun, args, kwargs = args_

其實就是把之前 args_ 的內容一一解析。

args_ = (6, None, <function _trace_task_ret at 0x000001BFAE53EA68>, ('myTest.add', '2c6d431f-a86a-4972-886b-472662401d20', {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen14656@DESKTOP-0GO3RPO', 'reply_to': '3c9cc3a7-65d6-349b-ba66-399dc47b7cad', 'correlation_id': '2c6d431f-a86a-4972-886b-472662401d20', 'hostname': 'DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}, 'is_eager': False, 'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {}))

所以得到 :

job = {int} 6

i = {NoneType} None

fun = {function} <function _trace_task_ret at 0x000001BFAE53EA68>

kwargs = {dict: 0} {}

args = {tuple: 6} 
 0 = {str} 'myTest.add'
 1 = {str} '2c6d431f-a86a-4972-886b-472662401d20'
 2 = {dict: 26} {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20',
 3 = {bytes: 81} b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]'
 4 = {str} 'application/json'
 5 = {str} 'utf-8'
 __len__ = {int} 6

這樣,子程式就知道自己需要呼叫什麼函式(這裡就是 myTest.add ),函式有什麼引數(這裡就是 (2, 8)

我們理一下訊息讀取解析流程:

  • 父程式寫入 task
  • 子程式讀取為 req
  • 子程式解析 req 為 type_,args_
  • 子程式解析 args_ 為:job, i, fun, args, kwargs。這裡的 fun 是 _trace_task_ret,使用者自定的函式由 _trace_task_ret 內部呼叫。
  • 在 args 之中,才包含使用者自定義函式和其引數;

3.3.1 回撥函式在父程式中的配置

剛剛提到,第一次解析出來的 fun 是 _trace_task_ret,使用者自定的函式由 _trace_task_ret 內部呼叫。

我們需要看看回撥函式 fun 在父程式中哪裡配置。

由前文我們知道,當接受到任務時候,task_message_handler 會通過 Rqeust 類來使用多程式

注意:這個圖 中的 Worker scope 是 celery/apps/worker.py,屬於 Celery 之中邏輯範疇,不是子程式相關概念。Celery 中有多個同名類,這點很讓人糾結。

                         +
  Consumer               |
                 message |
                         v         strategy  +------------------------------------+
            +------------+------+            | strategies                         |
            | on_task_received  | <--------+ |                                    |
            |                   |            |[myTest.add : task_message_handler] |
            +------------+------+            +------------------------------------+
                         |
                         |
 +------------------------------------------------------------------------------------+
 strategy                |
                         |
                         |
                         v                Request [myTest.add]
            +------------+-------------+                       +---------------------+
            | task_message_handler     | <-------------------+ | create_request_cls  |
            |                          |                       |                     |
            +------------+-------------+                       +---------------------+
                         | _process_task_sem
                         |
+--------------------------------------------------------------------------------------+
 Worker                  | req[{Request} myTest.add]
                         v
                +--------+-----------+
                | WorkController     |
                |                    |       apply_async
                |            pool +-------------------------+
                +--------+-----------+                      |
                         |                                  |
                         |                                  v
             +-----------+----------+                   +---+-------+
             |{Request} myTest.add  | +---------------> | TaskPool  |
             +----------------------+                   +-----------+
                                        myTest.add

手機如下:

此時呼叫的 apply_async 其實就是pool.apply_async的方法。

在 Request 類的 execute_using_pool中,我們發現,pool.apply_async 的引數正是 trace_task_ret,所以就知道了,trace_task_ret 必然就是父程式傳遞的引數

class Request:
    """A request for task execution."""
    
   def execute_using_pool(self, pool, **kwargs):
        """Used by the worker to send this task to the pool.
        """

        result = pool.apply_async(
            trace_task_ret, # 就是這裡
            args=(self._type, task_id, self._request_dict, self._body,
                  self._content_type, self._content_encoding), # 這裡才包含了使用者自定義的函式
            accept_callback=self.on_accepted,
            timeout_callback=self.on_timeout,
            callback=self.on_success,
            error_callback=self.on_failure,
            soft_timeout=soft_time_limit or task.soft_time_limit,
            timeout=time_limit or task.time_limit,
            correlation_id=task_id,
        )
        # cannot create weakref to None
        self._apply_result = maybe(ref, result)
        return result    

3.4 呼叫函式

由上面知道,Pool 的 呼叫函式是:_trace_task_ret,即 _trace_task_ret 是 一個對使用者函式的統一外層封裝,對於 Pool 來說,呼叫 _trace_task_ret 即可,_trace_task_ret 內部會呼叫使用者函式

為什麼不直接呼叫使用者函式 myTest.add?而是使用 _trace_task_ret 再封裝一層?從名字帶上 trace 就能看出來,這裡就是擴充套件性,除錯,trace 和 執行速度的一個綜合妥協

核心程式碼為兩處:

3.3.1 獲取 Celery 應用

第一處重點為:獲取事先在子程式就設定好的 Celery 應用,程式碼如下:

app = app or current_app._get_current_object()

這裡就有一個問題:Celery 應用是在父程式中,子程式如何得到。

雖然在一些多程式機制中,父程式的變數是會複製到子程式中,但是這並不是一定的,所以必然有一個父程式把 Celery 應用 設定給子程式的機制。

具體關於 父程式是如何給子程式配置 Celery應用,以及子程式如何得到這個應用的詳細解析,請參見前文。

3.3.2 獲取任務

第二處重點在於:如何獲取實現註冊好的任務task。程式碼如下:

R, I, T, Rstr = trace_task(app.tasks[name], uuid, args, kwargs, request, app=app)

其中,app.tasks為事先註冊的變數,就是 Celery 之中的所有任務,其中包括內建任務和使用者任務。

於是 app.tasks[name] 就是通過任務名字來得到對應的任務本身

app.tasks = {TaskRegistry: 9} 
 NotRegistered = {type} <class 'celery.exceptions.NotRegistered'>
 'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest at 0x1bfae596d48>
 'celery.chord' = {chord} <@task: celery.chord of myTest at 0x1bfae596d48>
 'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest at 0x1bfae596d48>
 'celery.chunks' = {chunks} <@task: celery.chunks of myTest at 0x1bfae596d48>
 'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest at 0x1bfae596d48>
 'celery.group' = {group} <@task: celery.group of myTest at 0x1bfae596d48>
 'celery.map' = {xmap} <@task: celery.map of myTest at 0x1bfae596d48>
 'celery.chain' = {chain} <@task: celery.chain of myTest at 0x1bfae596d48>
 'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest at 0x1bfae596d48>

此時邏輯如下:

                                                                   +
                                                                   |
                                                                   |
                                                                   v
                                                           +-------+---------------+
                                                           | billiard.pool.Pool    |
                                                           +-------+---------------+
                                                                   |
                                                                   |
    +---------------------------+                                  |
    | TaskHandler               |                                  |
    |                           |                                  | self._taskqueue.put
    |              _taskqueue   |  <-------------------------------+
    |                           |
    +------------+--------------+
                 |
                 |  put(task)                                                     Pool
                 |
 +-------------------------------------------------------------------------------------+
                 |
                 |  get                               billiard.pool.Worker   Sub process
                 v
+----------------+------+           +--------------------------------------------------+
|  workloop             |           | app.tasks                                        |
|                       |           |                                                  |
|       wait_for_job    |           |'celery.chord' =  @task: celery.chord of myTest   |
|                       |           |'celery.chunks' =  @task: celery.chunks of myTest |
|     app.tasks[name] <-------------+'celery.group' =   @task: celery.group of myTest> |
|                       |           | ......                                           |
|                       |           |                                                  |
+-----------------------+           +--------------------------------------------------+

手機如下:

3.3.3 呼叫任務

既然得到了要呼叫哪一個任務,我們就看看如何呼叫。

3.3.3.1 獲取任務

由上面可知,回撥函式是從父程式傳過來的,即

fun = {function} <function _trace_task_ret at 0x000001BFAE53EA68>

_trace_task_ret 的定義在celery\app\trace.py。

邏輯為:

  • 獲取 Celery 應用 到 app。

  • 提取訊息內容等,更新 Request,比如:

    • request = {dict: 26} 
       'lang' = {str} 'py'
       'task' = {str} 'myTest.add'
       'id' = {str} 'a8928c1e-1e56-4502-9929-80a01b1bbfd8'
       'shadow' = {NoneType} None
       'eta' = {NoneType} None
       'expires' = {NoneType} None
       'group' = {NoneType} None
       'group_index' = {NoneType} None
       'retries' = {int} 0
       'timelimit' = {list: 2} [None, None]
       'root_id' = {str} 'a8928c1e-1e56-4502-9929-80a01b1bbfd8'
       'parent_id' = {NoneType} None
       'argsrepr' = {str} '(2, 8)'
       'kwargsrepr' = {str} '{}'
       'origin' = {str} 'gen17060@DESKTOP-0GO3RPO'
       'reply_to' = {str} '5a520373-7712-3326-9ce8-325df14aa2ad'
       'correlation_id' = {str} 'a8928c1e-1e56-4502-9929-80a01b1bbfd8'
       'hostname' = {str} 'DESKTOP-0GO3RPO'
       'delivery_info' = {dict: 4} {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}
       'args' = {list: 2} [2, 8]
       'kwargs' = {dict: 0} {}
       'is_eager' = {bool} False
       'callbacks' = {NoneType} None
       'errbacks' = {NoneType} None
       'chain' = {NoneType} None
       'chord' = {NoneType} None
       __len__ = {int} 26
      
  • 從task 名字得倒 使用者Task

  • 利用 request 呼叫 使用者Task。

具體程式碼如下:

def trace_task(task, uuid, args, kwargs, request={}, **opts):
    """Trace task execution."""
    try:
        if task.__trace__ is None:
            task.__trace__ = build_tracer(task.name, task, **opts)
        return task.__trace__(uuid, args, kwargs, request) # 呼叫在strategy更新時寫入的方法


def _trace_task_ret(name, uuid, request, body, content_type,
                    content_encoding, loads=loads_message, app=None,
                    **extra_request):
    
    app = app or current_app._get_current_object()    # 獲取Celery 應用
    
    embed = None
    if content_type:
        accept = prepare_accept_content(app.conf.accept_content)
        args, kwargs, embed = loads(
            body, content_type, content_encoding, accept=accept,
        )
    else:
        args, kwargs, embed = body
    
    request.update({
        'args': args, 'kwargs': kwargs,
        'hostname': hostname, 'is_eager': False,
    }, **embed or {})
    
    R, I, T, Rstr = trace_task(app.tasks[name],
                        uuid, args, kwargs, request, app=app)    # 呼叫trace_task執行task
    
    return (1, R, T) if I else (0, Rstr, T)

trace_task_ret = _trace_task_ret

此時變數為:

accept = {set: 1} {'application/json'}
app = {Celery} <Celery myTest at 0x1bfae596d48>
args = {list: 2} [2, 8]
body = {bytes: 81} b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]'
content_encoding = {str} 'utf-8'
content_type = {str} 'application/json'
embed = {dict: 4} {'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}
extra_request = {dict: 0} {}
kwargs = {dict: 0} {}
loads = {method} <bound method SerializerRegistry.loads of <kombu.serialization.SerializerRegistry object at 0x000001BFAE329408>>
name = {str} 'myTest.add'
request = {dict: 26} {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20',
uuid = {str} '2c6d431f-a86a-4972-886b-472662401d20'
3.3.3.2 呼叫任務

呼叫時候用到了trace_task,其定義如下:

def trace_task(task, uuid, args, kwargs, request=None, **opts):
    """Trace task execution."""
    request = {} if not request else request
    try:
        if task.__trace__ is None:
            task.__trace__ = build_tracer(task.name, task, **opts)
        return task.__trace__(uuid, args, kwargs, request)

在update_stragegy時傳入的方法是,

task.__trace__ = build_tracer(name, task, loader, self.hostname,
                                          app=self.app) 

build_tracer函式的部分解析是,

def build_tracer(name, task, loader=None, hostname=None, store_errors=True,
                 Info=TraceInfo, eager=False, propagate=False, app=None,
                 monotonic=monotonic, truncate=truncate,
                 trace_ok_t=trace_ok_t, IGNORE_STATES=IGNORE_STATES):
  
    fun = task if task_has_custom(task, '__call__') else task.run   # 獲取task對應的run函式

    ...
    def trace_task(uuid, args, kwargs, request=None):
        # R      - is the possibly prepared return value.
        # I      - is the Info object.
        # T      - runtime
        # Rstr   - textual representation of return value
        # retval - is the always unmodified return value.
        # state  - is the resulting task state.

        # This function is very long because we've unrolled all the calls
        # for performance reasons, and because the function is so long
        # we want the main variables (I, and R) to stand out visually from the
        # the rest of the variables, so breaking PEP8 is worth it ;)
        
        R = I = T = Rstr = retval = state = None
        task_request = None
        time_start = monotonic()
        ...
        # -*- TRACE -*-
            try:
                R = retval = fun(*args, **kwargs) # 執行對應的函式
                state = SUCCESS
            except Reject as exc:
                    ...
    return trace_task

此時呼叫的 fun 函式才是task本來應該執行的函式(myTest.add),此時就執行了對應task並獲得了函式執行的返回結果

至此,一個消費的過程就完成了。

從下文開始,我們介紹 Celery 的一些輔助功能,比如負載均衡,容錯等等。

0xFF 參考

celery原始碼分析-Task的初始化與傳送任務

Celery 原始碼解析三: Task 物件的實現

分散式任務佇列 Celery —— 詳解工作流

相關文章