[原始碼解析] 並行分散式任務佇列 Celery 之 多程式模型
0x00 摘要
Celery是一個簡單、靈活且可靠的,處理大量訊息的分散式系統,專注於實時處理的非同步任務佇列,同時也支援任務排程。因為 Celery 通過多程式來提高執行效率,所以本文將帶領大家初步瞭解 Celery 之 多程式架構和模型。
通過本文,大家可以瞭解為了實現一個多程式架構,Celery 都有哪些思考,做了哪些抽象,比如:
- Celery 作為一個整體系統如何把多程式模型融入進來,從而得到程式池;
- 如何根據不同 OS 例項化不同的多程式模型;
- 如何建立父子程式之間的通訊機制,如何讀寫分離;
- 如何生成子程式,子程式工作邏輯為何,如何抽象子程式;
- 如何輔助管理子程式;
- 如何給子程式分配任務;
- 如何處理子程式返回;
我們先給出一個粗略邏輯,讓大家有一個大致邏輯,便於後續的理解。下圖中需要注意的是:
- TaskHandler 是父程式給子程式分配任務的邏輯;
- ResultHandler 是父程式處理子程式返回的邏輯;
- Supervisor 是輔助管理handler;
- Worker 是子程式邏輯業務程式碼,
_pool
是程式池,ForkProcess(也就是 WorkerProcess)是子程式抽象,每個子程式抽象都會執行 Worker。
這幾個邏輯概念一定要分清楚。
+--------------------------+
| AsynPool |
| |
| |
| ResultHandler +-------> celery.concurrency.asynpool.ResultHandler
| |
| Supervisor +-------> billiard.pool.Supervisor
| |
| TaskHandler +-------> billiard.pool.TaskHandler
| |
| TimeoutHandler +-------> billiard.pool.TimeoutHandler
| |
| Worker +-------> celery.concurrency.asynpool.Worker
| |
| _pool +-----------------+---> <ForkProcess(ForkPoolWorker-1, started daemon)>
+--------------------------+ |
+---> <ForkProcess(ForkPoolWorker-2, started daemon)>
|
+---> <ForkProcess(ForkPoolWorker-3, started daemon)>
|
+---> <ForkProcess(ForkPoolWorker-4, started daemon)>
手機如下
多程式入口位於 Consumer 的 pool step,所以我們從 Consumer 元件啟動入手。
0x01 Consumer 元件 Pool bootstep
首先,Consumer Pool 啟動從 bootsteps 開始。這個 Bootstep 是 worker 真正的執行引擎。
這裡的 Pool bootstrap 之所以又做了一層封裝,是因為它需要設定一個伸縮值,也就是所謂的 autoscaler。因為我們這裡已經構建了各種池,後面有 task 直接往 Pool 裡頭丟就行。
1.1 bootsteps
程式碼位於:celery/worker/components.py。這就是一個入口,目的為:
- 做各種配置;
- 引入 TaskPool,TaskPool 是 worker 多程式的入口;
class Pool(bootsteps.StartStopStep):
def __init__(self, w, autoscale=None, **kwargs):
w.pool = None
if isinstance(autoscale, str):
max_c, _, min_c = autoscale.partition(',')
autoscale = [int(max_c), min_c and int(min_c) or 0]
w.autoscale = autoscale
if w.autoscale:
w.max_concurrency, w.min_concurrency = w.autoscale
super().__init__(w, **kwargs)
def create(self, w):
semaphore = None
max_restarts = None
if w.app.conf.worker_pool in GREEN_POOLS: # pragma: no cover 判斷worker_pool是在'eventlet', 'gevent'中預設的是prefork
threaded = not w.use_eventloop or IS_WINDOWS # user_eventloop是否為True和是否是windows如果是則使用執行緒
procs = w.min_concurrency # 最小緩衝池個數,預設為4
w.process_task = w._process_task # 將worker的_process_task繫結到process_task
if not threaded: # 不使用執行緒的話
semaphore = w.semaphore = LaxBoundedSemaphore(procs) # 通過LaxBoundedSemaphore實現原子操作,利用佇列實現
w._quick_acquire = w.semaphore.acquire # 將相關的操作方法賦值給worker
w._quick_release = w.semaphore.release
max_restarts = 100 # 最大重啟次數
if w.pool_putlocks and w.pool_cls.uses_semaphore: # 通過檢視類配置是否更新process_task方法
w.process_task = w._process_task_sem # 預設配置更新process_task
allow_restart = w.pool_restarts # 是否允許重啟
pool = w.pool = self.instantiate(
w.pool_cls, w.min_concurrency, # w.pool_cls預設是prefork.TaskPool
initargs=(w.app, w.hostname),
maxtasksperchild=w.max_tasks_per_child,
max_memory_per_child=w.max_memory_per_child,
timeout=w.time_limit,
soft_timeout=w.soft_time_limit,
putlocks=w.pool_putlocks and threaded,
lost_worker_timeout=w.worker_lost_wait,
threads=threaded,
max_restarts=max_restarts,
allow_restart=allow_restart,
forking_enable=True,
semaphore=semaphore,
sched_strategy=self.optimization,
app=w.app,
)
_set_task_join_will_block(pool.task_join_will_block)
return pool
這裡 w.pool_cls 是 <class 'celery.concurrency.prefork.TaskPool'>,邏輯如下:
+-------------------------------+
| Pool(bootsteps.StartStopStep) |
| |
| |
| celery/worker/components.py |
+---------------+---------------+
|
|
|
v
__init__
+
|
|
|
v
create
+
|
|
v
+--------+----------+
| TaskPool |
| |
| Pool +-------> celery.concurrency.asynpool.AsynPool
| |
| app +-------> Celery
| |
+-------------------+
0x02 程式池入口 -- TaskPool
TaskPool 是多程式的入口,這裡對於所有作業系統都是統一的。
因為 這裡 w.pool_cls 是 <class 'celery.concurrency.prefork.TaskPool'>,所以程式碼來到TaskPool,在初始化時候,instantiate會先來到基類 BasePool,位置在:celery/concurrency/base.py。
2.1 程式池初始化
這裡 __init__
注意的為:self.app = app ,即初始化時會傳入 celery 應用自己。
class BasePool:
"""Task pool."""
Timer = timer2.Timer
_state = None
_pool = None
def __init__(self, limit=None, putlocks=True, forking_enable=True,
callbacks_propagate=(), app=None, **options):
self.limit = limit
self.putlocks = putlocks
self.options = options
self.forking_enable = forking_enable
self.callbacks_propagate = callbacks_propagate
self.app = app
2.2 程式池啟動 start
Blueprint 會呼叫start。
class Blueprint:
def start(self, parent):
self.state = RUN
if self.on_start:
self.on_start()
for i, step in enumerate(s for s in parent.steps if s is not None):
self.started = i + 1
step.start(parent)
由於TaskPool
中沒有宣告start
函式,因此這裡會呼叫到其父類BasePool
中定義的函式,定義如下
class BasePool(object):
"""Task pool."""
def start(self):
self._does_debug = logger.isEnabledFor(logging.DEBUG)
self.on_start()
self._state = self.RUN
這裡會呼叫到on_start
函式,由於各子類覆蓋了該函式,因此會呼叫子類中的on_start
函式,同樣地,以TaskPool
為例,on_start
函式的定義如下
class TaskPool(BasePool):
"""Multiprocessing Pool implementation."""
Pool = AsynPool
BlockingPool = BlockingPool
uses_semaphore = True
write_stats = None
def on_start(self):
forking_enable(self.forking_enable)
Pool = (self.BlockingPool if self.options.get('threads', True)
else self.Pool) # 若使用多執行緒則使用BlockingPool否則使用AsynPool
P = self._pool = Pool(processes=self.limit,
initializer=process_initializer,
on_process_exit=process_destructor,
enable_timeouts=True,
synack=False,
**self.options) # 建立Pool
# Create proxy methods 建立代理
self.on_apply = P.apply_async # 將pool中的方法設定到Pool類上
self.maintain_pool = P.maintain_pool
self.terminate_job = P.terminate_job
self.grow = P.grow
self.shrink = P.shrink
self.flush = getattr(P, 'flush', None) # FIXME add to billiard
可以看到,on_start
函式主要完成了3個工作
- 根據選項引數確定使用
BlockingPool
還是AsynPool
(分別為billiard.pool.Pool
和celery.concurrency.asynpool.AsyncPool
); - 建立
Pool;
- 建立代理方法;
這裡 windows 系統中,對應的 _pool 為 <class 'billiard.pool.Pool'>,mac系統則為:AsyncPool。
此時具體邏輯如下:
+------------------------------+
| Pool(bootsteps.StartStopStep)|
+-------------+--------------+
|
|
|
1 | instantiate
| 2 on_start
| +--------------------------------+
v | |
+-------+--------+--+ |
| TaskPool | |
| | +------+ |
| app +----------> |celery| |
| | +------+ |
| | |
| | +-----------+ |
| _pool +----------> | AsynPool | |
| | +-----------+ |
+----------------+--+ |
^ |
| |
+--------------------------------+
0x03 程式池實現 -- AsynPool
_pool 是根據作業系統的不同進行了分別實現。可以看到這裡配置了管道,file,queue,以及真實的 具體執行緒池。
假設系統為 mac,於是來到了 AsynPool 這個程式池。位置在:celery/concurrency/asynpool.py
3.1 例項化
主要是執行了程式池 Pool
的例項化。這個例項化就是 prefork 的具體實現。這個 Pool 其實就是 AsyncPool。
具體工作為:
-
配置排程策略;
-
根據本機配置程式數量,就是需要 fork 的子程式數量,預設是 cpu 核數,如果在命令列制定了 -c 引數,則是 -c 引數的值;
-
建立出來一堆讀和寫的管道。根據流向的不同和主程式與子程式的不同,之後會分別關閉對應的的一端的管道,比如父程式把寫關閉,子程式就把讀關閉。並會用抽象的資料結構進行封裝以便於管理。這個資料結構的例項用來為主程式和即將 fork 的子程式提供雙向的資料傳輸。同樣的,會根據子程式的數量建立出多個管道例項來;
-
呼叫基類構造方法。這裡為 fork 的關鍵所在;
-
根據建立子程式結果,配置file 到 queue 的關係;
程式碼如下:
class AsynPool(_pool.Pool):
"""AsyncIO Pool (no threads)."""
ResultHandler = ResultHandler
Worker = Worker
def WorkerProcess(self, worker):
worker = super().WorkerProcess(worker)
worker.dead = False
return worker
def __init__(self, processes=None, synack=False,
sched_strategy=None, proc_alive_timeout=None,
*args, **kwargs):
self.sched_strategy = SCHED_STRATEGIES.get(sched_strategy,
sched_strategy)
processes = self.cpu_count() if processes is None else processes #需要 fork 的子程式數量,預設是 cpu 核數,如果在命令列制定了 -c 引數,則是 -c 引數的值
self.synack = synack
# create queue-pairs for all our processes in advance.
self._queues = {
self.create_process_queues(): None for _ in range(processes) #建立出來一堆讀和寫的管道
}
# inqueue fileno -> process mapping
self._fileno_to_inq = {}
# outqueue fileno -> process mapping
self._fileno_to_outq = {}
# synqueue fileno -> process mapping
self._fileno_to_synq = {}
# We keep track of processes that haven't yet
# sent a WORKER_UP message. If a process fails to send
# this message within _proc_alive_timeout we terminate it
# and hope the next process will recover.
self._proc_alive_timeout = (
PROC_ALIVE_TIMEOUT if proc_alive_timeout is None
else proc_alive_timeout
)
self._waiting_to_start = set()
# denormalized set of all inqueues.
self._all_inqueues = set()
# Set of fds being written to (busy)
self._active_writes = set()
# Set of active co-routines currently writing jobs.
self._active_writers = set()
# Set of fds that are busy (executing task)
self._busy_workers = set()
self._mark_worker_as_available = self._busy_workers.discard
# Holds jobs waiting to be written to child processes.
self.outbound_buffer = deque()
self.write_stats = Counter()
super().__init__(processes, *args, **kwargs) #呼叫基類構造方法
for proc in self._pool:
# create initial mappings, these will be updated
# as processes are recycled, or found lost elsewhere.
self._fileno_to_outq[proc.outqR_fd] = proc
self._fileno_to_synq[proc.synqW_fd] = proc
self.on_soft_timeout = getattr(
self._timeout_handler, 'on_soft_timeout', noop,
)
self.on_hard_timeout = getattr(
self._timeout_handler, 'on_hard_timeout', noop,
)
3.2 建立通訊機制 queues
例項化程式碼中,queues 的建立需要重點說明,因為父程式 和 子程式 之間使用 queue 來進行通訊。
程式碼如下:
self._queues = {
self.create_process_queues(): None for _ in range(processes)
}
這裡建立出來一堆讀和寫的管道,這裡程式數量為 4,因此建立 4 組管道列表,每組列表包括兩個_SimpleQueue,具體如下。
self._queues = {dict: 4}
(<billiard.queues._SimpleQueue object>, <billiard.queues._SimpleQueue object at 0x = {NoneType}
(<billiard.queues._SimpleQueue object>, <billiard.queues._SimpleQueue object at 0x = {NoneType}
(<billiard.queues._SimpleQueue object>, <billiard.queues._SimpleQueue object at 0x = {NoneType}
(<billiard.queues._SimpleQueue object>, <billiard.queues._SimpleQueue object at 0x = {NoneType}
__len__ = {int} 4
建立 queues 方法如下,這裡建立了 inq, outq, synq:
def create_process_queues(self):
"""Create new in, out, etc. queues, returned as a tuple."""
# NOTE: Pipes must be set O_NONBLOCK at creation time (the original
# fd), otherwise it won't be possible to change the flags until
# there's an actual reader/writer on the other side.
inq = _SimpleQueue(wnonblock=True)
outq = _SimpleQueue(rnonblock=True)
synq = None
if self.synack:
synq = _SimpleQueue(wnonblock=True)
return inq, outq, synq
3.2.1 _SimpleQueue
_SimpleQueue為一個locked pipe,即管道。定義如下:
class _SimpleQueue(object):
'''
Simplified Queue type -- really just a locked pipe
'''
def __init__(self, rnonblock=False, wnonblock=False, ctx=None):
self._reader, self._writer = connection.Pipe(
duplex=False, rnonblock=rnonblock, wnonblock=wnonblock,
)
self._poll = self._reader.poll
self._rlock = self._wlock = None
變數舉例如下:
self._poll = {method} <bound method _ConnectionBase.poll of <billiard.connection.Connection self = {_SimpleQueue} <billiard.queues._SimpleQueue object at 0x7fc46ae049e8>
_reader = {Connection} <billiard.connection.Connection object at 0x7fc46ae68c18>
_writer = {Connection} <billiard.connection.Connection object at 0x7fc46ae726a0>
3.2.2 Pipe
上文中,_SimpleQueue 的 self._reader, self._writer
是 pipe 型別,所以需要看看。
pipe 的定義如下:
其實就建立了兩個Connection,返回給_SimpleQueue,這兩個Connection一個為讀抽象,一個為寫抽象。
if sys.platform != 'win32':
def Pipe(duplex=True, rnonblock=False, wnonblock=False):
'''
Returns pair of connection objects at either end of a pipe
'''
if duplex:
s1, s2 = socket.socketpair()
s1.setblocking(not rnonblock)
s2.setblocking(not wnonblock)
c1 = Connection(detach(s1))
c2 = Connection(detach(s2))
else:
fd1, fd2 = os.pipe()
if rnonblock:
setblocking(fd1, 0)
if wnonblock:
setblocking(fd2, 0)
c1 = Connection(fd1, writable=False)
c2 = Connection(fd2, readable=False)
return c1, c2
3.2.3 Connection
上面又涉及到了 Connection,注意這裡不是 Kombu 的connection,而是多程式內部自己的 Connection 定義。
class Connection(_ConnectionBase):
"""
Connection class based on an arbitrary file descriptor (Unix only), or
a socket handle (Windows).
"""
Connection 是 基於 file descriptor 的連線類。
class _ConnectionBase(object):
_handle = None
def __init__(self, handle, readable=True, writable=True):
if isinstance(handle, _SocketContainer):
self._socket = handle.sock # keep ref so not collected
handle = handle.sock.fileno()
handle = handle.__index__()
self._handle = handle
self._readable = readable
self._writable = writable
現在變數如下:
c1 = {Connection} <billiard.connection.Connection object at 0x7fc46ae68c18>
c2 = {Connection} <billiard.connection.Connection object at 0x7fc46ae726a0>
於是 AsynPool 最終如下:
+------------------------------+ +----------------+
| Pool(bootsteps.StartStopStep)| +-----------------+ | Connection |
+-------------+--------------+ | _SimpleQueue | | |
| | | | _write |
| | _reader +---------> | _read |
| | | | _send |
1 | instantiate | | | _recv |
| | | | _handle |
2 on_start | | | +----------------+
| | _poll +---------> _ConnectionBase.poll
+-------------+ | | |
| | | | | +------------+
| | v | _writer +---------> | Connection |
| +---+---+-----------+ | | +------------+
| | TaskPool | +-------+---------+
| | | +------+ ^
| | app +----------> |celery| |
| | | +------+ |
| | | +
| | | +--------------------------+ +----> (<_SimpleQueue>, <_SimpleQueue>)
| | _pool +----------> | AsynPool | |
| | | | | |
| +---+---------------+ | _queues +------->-----> (<_SimpleQueue>, <_SimpleQueue>)
| ^ | | |
| | | | |
| | +--------------------------+ +----> (<_SimpleQueue>, <_SimpleQueue>)
+-------------+ |
|
+----> (<_SimpleQueue>, <_SimpleQueue>)
手機如下:
3.3 程式池基類構造方法
我們要再說說 AsynPool 的基類,這是 Celery 作者專門為 python 多程式做的修改,封裝。這裡建立了各種 訊息處理函式,並且建立了子程式。
位置在:billiard/pool.py
這裡關鍵工作如下:
-
用 self._Process = self._ctx.Process 設定成為
<class 'billiard.context.ForkProcess'>
; -
根據子程式數量通過 _create_worker_process(i) 建立子程式;
-
建立 self._worker_handler = self.Supervisor(self);
-
建立分配任務 TaskHandler;
-
建立 TimeoutHandler;
-
建立 ResultHandler;
具體程式碼如下:
class Pool(object):
'''
Class which supports an async version of applying functions to arguments.
'''
def __init__(self, processes=None, initializer=None, initargs=(),..., **kwargs):
self._ctx = context or get_context()
self._setup_queues()
self._taskqueue = Queue()
self._cache = {}
self._state = RUN
.....
self.readers = {}
self._processes = self.cpu_count() if processes is None else processes
self.max_restarts = max_restarts or round(self._processes * 100)
self.restart_state = restart_state(max_restarts, max_restart_freq or 1)
self._Process = self._ctx.Process
self._pool = []
self._poolctrl = {}
self._on_ready_counters = {}
self.putlocks = putlocks
self._putlock = semaphore or LaxBoundedSemaphore(self._processes)
for i in range(self._processes):
self._create_worker_process(i)
self._worker_handler = self.Supervisor(self)
if threads:
self._worker_handler.start()
self._task_handler = self.TaskHandler(self._taskqueue,
self._quick_put,
self._outqueue,
self._pool,
self._cache)
if threads:
self._task_handler.start()
# Thread killing timedout jobs.
if self.enable_timeouts:
self._timeout_handler = self.TimeoutHandler(
self._pool, self._cache,
self.soft_timeout, self.timeout,
)
self._timeout_handler_mutex = Lock()
self._timeout_handler_started = False
self._start_timeout_handler()
# If running without threads, we need to check for timeouts
# while waiting for unfinished work at shutdown.
if not threads:
self.check_timeouts = self._timeout_handler.handle_event
# Thread processing results in the outqueue.
self._result_handler = self.create_result_handler()
self.handle_result_event = self._result_handler.handle_event
if threads:
self._result_handler.start()
self._terminate = Finalize(
self, self._terminate_pool,
args=(self._taskqueue, self._inqueue, self._outqueue,
self._pool, self._worker_handler, self._task_handler,
self._result_handler, self._cache,
self._timeout_handler,
self._help_stuff_finish_args()),
exitpriority=15,
)
下面我們具體一一分析。
3.3.1 建立子程式
如下程式碼建立子程式。
for i in range(self._processes):
self._create_worker_process(i)
_create_worker_process 主要工作如下:
-
inq, outq, synq = self.get_process_queues() 拿到的是一個讀和寫的管道的抽象物件。這個管道是之前預先建立好的(就是上面 self.create_process_queues() 建立的)。主要是給即將 fork 的子程式用的,子程式會監聽這管道資料結構抽象例項中的讀事件,還可以從寫管道寫資料。
-
w,也就是 self.WorkerProcess 的例項,其實是對 fork 出來的子程式的一個抽象封裝。用來方便快捷的管理子程式,抽象成一個程式池,這個 w 會記錄 fork 出來的子程式的一些 meta 資訊,比如 pid,管道的讀寫的 fd 等等,並註冊在主程式中,主程式可以利用它進行任務分發;
-
把 WorkerProcess 的例項記錄在 self._pool,這個很重要,父程式就是用此變數來知道有哪幾個子程式;
-
w.start() 中包含具體的 fork 過程;
程式碼如下:
def _create_worker_process(self, i):
sentinel = self._ctx.Event() if self.allow_restart else None
inq, outq, synq = self.get_process_queues()
on_ready_counter = self._ctx.Value('i')
w = self.WorkerProcess(self.Worker(
inq, outq, synq, self._initializer, self._initargs,
self._maxtasksperchild, sentinel, self._on_process_exit,
# Need to handle all signals if using the ipc semaphore,
# to make sure the semaphore is released.
sigprotection=self.threads,
wrap_exception=self._wrap_exception,
max_memory_per_child=self._max_memory_per_child,
on_ready_counter=on_ready_counter,
))
self._pool.append(w)
self._process_register_queues(w, (inq, outq, synq))
w.name = w.name.replace('Process', 'PoolWorker')
w.daemon = True
w.index = i
w.start()
self._poolctrl[w.pid] = sentinel
self._on_ready_counters[w.pid] = on_ready_counter
if self.on_process_up:
self.on_process_up(w)
return w
因為提到了 self.WorkerProcess(self.Worker...,所以我們分別介紹下 WorkerProcess 與 Worker。
此時邏輯簡略如下:
+----------------+
| StartStopStep |
+-------+--------+
|
| start
|
v
+-----------+-------------------+
| BasePool |
| celery/concurrency/base.py |
+-----------+-------------------+
|
| start
|
v
+-----------+-------------------+
| TaskPool |
| celery/concurrency/prefork.py |
+-----------+-------------------+
|
| on_start
|
v
+-----------+--------------------+
| AsynPool |
| celery/concurrency/asynpool.py |
+-----------+--------------------+
|
|
v
+--------+------------+
| class Pool(object) |
| billiard/pool.py |
+--------+------------+
|
+----+------+
| |
v v +----------------------+
__init__ _create_worker_process +---> | class Worker(object) |
+----------------------+
3.3.1.1 子程式工作程式碼
Worker 是子程式的工作程式碼。也有幾種不同的實現方式,比如:
celery.concurrency.asynpool.Worker,billiard/pool.Worker 都是子程式工作迴圈。
以 billiard/pool.Worker 為例看看。
Worker init 之中主要工作為:配置各種fd。
這裡 obj.inqW_fd = self.inq._writer.fileno() 就為從 queues 的對應的 Connection 得到對應的 fd :
class _ConnectionBase(object):
_handle = None
def fileno(self):
"""File descriptor or handle of the connection"""
self._check_closed()
return self._handle
具體 Worker 定義如下:
class Worker(object):
def __init__(self, inq, outq, synq=None, initializer=None, initargs=(),...):
......
self.max_memory_per_child = max_memory_per_child
self._shutdown = sentinel
self.inq, self.outq, self.synq = inq, outq, synq
self.contribute_to_object(self)
def contribute_to_object(self, obj):
obj.inq, obj.outq, obj.synq = self.inq, self.outq, self.synq
obj.inqW_fd = self.inq._writer.fileno() # inqueue write fd
obj.outqR_fd = self.outq._reader.fileno() # outqueue read fd
if self.synq:
obj.synqR_fd = self.synq._reader.fileno() # synqueue read fd
obj.synqW_fd = self.synq._writer.fileno() # synqueue write fd
obj.send_syn_offset = _get_send_offset(self.synq._writer)
else:
obj.synqR_fd = obj.synqW_fd = obj._send_syn_offset = None
obj._quick_put = self.inq._writer.send
obj._quick_get = self.outq._reader.recv
obj.send_job_offset = _get_send_offset(self.inq._writer)
return obj
變數為:
self = {Worker}
initargs = {tuple: 2} (<Celery tasks at 0x7f8a0a70dd30>, )
inq = {_SimpleQueue} <billiard.queues._SimpleQueue object at 0x7f8a0b66aba8>
inqW_fd = {int} 7
max_memory_per_child = {NoneType} None
maxtasks = {NoneType} None
on_ready_counter = {Synchronized} <Synchronized wrapper for c_int(0)>
outq = {_SimpleQueue} <billiard.queues._SimpleQueue object at 0x7f8a0b6844a8>
outqR_fd = {int} 8
sigprotection = {bool} False
synq = {NoneType} None
synqR_fd = {NoneType} None
synqW_fd = {NoneType} None
wrap_exception = {bool} True
AsynPool 簡略版邏輯如下:
下圖中需要注意:
Worker 是子程式邏輯業務程式碼,_pool 是程式池,ForkProcess(也就是 WorkerProcess)是子程式抽象,每個子程式抽象都會執行 Worker,這幾個邏輯概念一定要分清楚。
+--------------------------+
| AsynPool |
| |
| |
| ResultHandler +-------> celery.concurrency.asynpool.ResultHandler
| |
| Supervisor +-------> billiard.pool.Supervisor
| |
| TaskHandler +-------> billiard.pool.TaskHandler
| |
| TimeoutHandler +-------> billiard.pool.TimeoutHandler
| |
| Worker +-------> celery.concurrency.asynpool.Worker
| |
| _pool +-----------------+---> <ForkProcess(ForkPoolWorker-1, started daemon)>
+--------------------------+ |
+---> <ForkProcess(ForkPoolWorker-2, started daemon)>
|
+---> <ForkProcess(ForkPoolWorker-3, started daemon)>
|
+---> <ForkProcess(ForkPoolWorker-4, started daemon)>
手機如下
精細版邏輯如下:
+------------------------------+ +----------------+
| Pool(bootsteps.StartStopStep)| +-----------------+ | Connection |
+-------------+--------------+ | _SimpleQueue | | |
| | | | _write |
| | _reader +---------> | _read |
| | | | _send |
1 | instantiate | | | _recv |
| | | | _handle+---> {int} 8 <-+
2 on_start | | | +----------------+ |
| | _poll +---------> _ConnectionBase.poll |
+-------------+ | | | |
| | | | | +----------------+ |
| | v | _writer +---------> | Connection | |
| +---+---+-----------+ | | | | |
| | TaskPool | +-------+---------+ | _handle+----> {int} 7 |
| | | +------+ ^ | | |
| | app +----------> |celery| | +----------------+ ^ |
| | | +------+ | | |
| | | + | |
| | | +--------------------------+ +----> (<_SimpleQueue>, <_SimpleQueue>) | |
| | _pool +----------> | AsynPool | | | |
| | | | | | | |
| +---+---------------+ | _queues +------->-----> (<_SimpleQueue>, <_SimpleQueue>) | |
| ^ | | | | |
| | | | | | |
| | | | +----> (<_SimpleQueue>, <_SimpleQueue>) | |
+-------------+ | | | | |
| | | | |
+--------------------------+ +----> (<_SimpleQueue>, <_SimpleQueue>) | |
| |
+----------------------+ | |
| | | |
| Worker inq | | |
| | | |
| outq | | |
| | | |
| synq | | |
| | | |
| inqW_fd +-----------------------------------------+ |
| | |
| outqR_fd +------------------------------------------------+
| |
| workloop |
| |
| after_fork |
| |
+----------------------+
手機如下:
3.3.1.2 子程式抽象封裝 --- WorkerProcess
WorkerProcess 其實是對 fork 出來的子程式的一個抽象封裝。用來方便快捷的管理子程式,抽象成一個程式池,這個 w 會記錄 fork 出來的子程式的一些 meta 資訊,比如 pid,管道的讀寫的 fd 等等,並註冊在主程式中,主程式可以利用它進行任務分發;
WorkerProcess 的作用為封裝了 ForkProcess。ForkProcess定義如下:
class ForkProcess(process.BaseProcess):
_start_method = 'fork'
@staticmethod
def _Popen(process_obj):
from .popen_fork import Popen
return Popen(process_obj)
3.3.1.2.1 WorkerProcess 具體執行
WorkerProcess 具體執行為:
def WorkerProcess(self, worker):
worker = super().WorkerProcess(worker)
worker.dead = False
return worker
首先執行基類中的程式碼,因此最終返回 ForkProcess:
def Process(self, *args, **kwds):
return self._Process(*args, **kwds)
def WorkerProcess(self, worker):
return worker.contribute_to_object(self.Process(target=worker))
在 self._Process(*args, **kwds) 呼叫中,相關變數為:
self._Process = {type} <class 'billiard.context.ForkProcess'>
args = {tuple: 0} ()
kwds = {dict: 1} {'target': <celery.concurrency.asynpool.Worker object at 0x7f9c306326a0>}
self = {AsynPool} <celery.concurrency.asynpool.AsynPool object at 0x7f9c30604da0>
於是 呼叫到 ForkProcess(process.BaseProcess) 基類。
3.3.1.2.2 基類 BaseProcess
BaseProcess 基類如下,注意這裡 run 就是子程式的 loop, _target 就是 子程式的 執行程式碼。
_target = {Worker} <celery.concurrency.asynpool.Worker object at 0x7f9ad358b240>
定義如下:
class BaseProcess(object):
'''
Process objects represent activity that is run in a separate process
The class is analagous to `threading.Thread`
'''
def __init__(self, group=None, target=None, name=None,
args=(), kwargs={}, daemon=None, **_kw):
count = next(_process_counter)
self._identity = _current_process._identity + (count, )
self._config = _current_process._config.copy()
self._parent_pid = os.getpid()
self._popen = None
self._target = target
self._args = tuple(args)
self._kwargs = dict(kwargs)
self._name = (
name or type(self).__name__ + '-' +
':'.join(str(i) for i in self._identity)
)
if daemon is not None:
self.daemon = daemon
if _dangling is not None:
_dangling.add(self)
self._controlled_termination = False
def run(self):
'''
Method to be run in sub-process; can be overridden in sub-class
'''
if self._target:
self._target(*self._args, **self._kwargs)
基類處理完,於是得到 ForkProcess
self = {ForkProcess} <ForkProcess(ForkProcess-1, initial)>
authkey = {AuthenticationString: 32} b''
daemon = {bool} False
exitcode = {NoneType} None
ident = {NoneType} None
name = {str} 'ForkProcess-1'
pid = {NoneType} None
_args = {tuple: 0} ()
_authkey = {AuthenticationString: 32}
_children = {set: 0} set()
_config = {dict: 2} {'authkey': b'', 'semprefix': '/mp'}
_counter = {count} count(2)
_daemonic = {bool} False
_identity = {tuple: 1} 1
_kwargs = {dict: 0} {}
_name = {str} 'ForkProcess-1'
_parent_pid = {int} 14747
_popen = {NoneType} None
_start_method = {str} 'fork'
_target = {Worker} <celery.concurrency.asynpool.Worker object at 0x7f9ad358b240>
_tempdir = {NoneType} None
3.3.1.2.3 加入程式列表
生成子程式之後,self._pool.append(w) 的作用就是把子程式 加入 父程式之中 的 子程式列表。並且配置 queues。
def _create_worker_process(self, i):
sentinel = self._ctx.Event() if self.allow_restart else None
inq, outq, synq = self.get_process_queues()
on_ready_counter = self._ctx.Value('i')
w = self.WorkerProcess(self.Worker(
inq, outq, synq, self._initializer, self._initargs,
self._maxtasksperchild, sentinel, self._on_process_exit,
# Need to handle all signals if using the ipc semaphore,
# to make sure the semaphore is released.
sigprotection=self.threads,
wrap_exception=self._wrap_exception,
max_memory_per_child=self._max_memory_per_child,
on_ready_counter=on_ready_counter,
))
self._pool.append(w) # 執行到了這裡。
self._process_register_queues(w, (inq, outq, synq)) #到了這裡
此時變數如下:
self = {AsynPool} <celery.concurrency.asynpool.AsynPool object at 0x7f9ad36680f0>
ResultHandler = {type} <class 'celery.concurrency.asynpool.ResultHandler'>
SoftTimeLimitExceeded = {type} <class 'billiard.exceptions.SoftTimeLimitExceeded'>
Supervisor = {type} <class 'billiard.pool.Supervisor'>
TaskHandler = {type} <class 'billiard.pool.TaskHandler'>
TimeoutHandler = {type} <class 'billiard.pool.TimeoutHandler'>
Worker = {type} <class 'celery.concurrency.asynpool.Worker'>
......
outbound_buffer = {deque: 0} deque([])
readers = {dict: 0} {}
restart_state = {restart_state} <billiard.common.restart_state object at 0x7f9ad3668e80>
sched_strategy = {int} 4
timers = {dict: 1} {<bound method Pool.maintain_pool of <celery.concurrency.asynpool.AsynPool object at 0x7f9ad36680f0>>: 5.0}
write_stats = {Counter: 0} Counter()
_Process = {type} <class 'billiard.context.ForkProcess'>
_active_writers = {set: 0} set()
_active_writes = {set: 0} set()
_all_inqueues = {set: 0} set()
_busy_workers = {set: 0} set()
_cache = {dict: 0} {}
_ctx = {ForkContext} <billiard.context.ForkContext object at 0x7f9ad27ad828>
_fileno_to_inq = {dict: 0} {}
_fileno_to_outq = {dict: 0} {}
_fileno_to_synq = {dict: 0} {}
_initargs = {tuple: 2} (<Celery myTest at 0x7f9ad270c128>, 'celery@me2koreademini')
_inqueue = {NoneType} None
_max_memory_per_child = {NoneType} None
_maxtasksperchild = {NoneType} None
_on_ready_counters = {dict: 0} {}
_outqueue = {NoneType} None
_poll_result = {NoneType} None
_pool = {list: 1} [<ForkProcess(ForkPoolWorker-1, initial daemon)>]
_poolctrl = {dict: 0} {}
_proc_alive_timeout = {float} 4.0
_processes = {int} 4
_putlock = {LaxBoundedSemaphore} <LaxBoundedSemaphore at 0x7f9ad354db70 value:4 waiting:0>
_queues = {dict: 4} {(<billiard.queues._SimpleQueue object at 0x7f9ad35acef0>, <billiard.queues._SimpleQueue object at 0x7f9ad3668160>, None): <ForkProcess(ForkPoolWorker-1, initial daemon)>, (<billiard.queues._SimpleQueue object at 0x7f9ad36684a8>, <billiard.queues._SimpleQu
_quick_get = {NoneType} None
_quick_put = {NoneType} None
_state = {int} 0
_taskqueue = {Queue} <queue.Queue object at 0x7f9ad2a30908>
_waiting_to_start = {set: 0} set()
_wrap_exception = {bool} True
sentinel = {NoneType} None
synq = {NoneType} None
3.3.1.3 fork 過程
w.start() 中包含具體的 fork 過程。
def _create_worker_process(self, i):
sentinel = self._ctx.Event() if self.allow_restart else None
inq, outq, synq = self.get_process_queues()
on_ready_counter = self._ctx.Value('i')
w = self.WorkerProcess(self.Worker(
inq, outq, synq, self._initializer, self._initargs,
self._maxtasksperchild, sentinel, self._on_process_exit,
# Need to handle all signals if using the ipc semaphore,
# to make sure the semaphore is released.
sigprotection=self.threads,
wrap_exception=self._wrap_exception,
max_memory_per_child=self._max_memory_per_child,
on_ready_counter=on_ready_counter,
))
self._pool.append(w)
self._process_register_queues(w, (inq, outq, synq))
w.name = w.name.replace('Process', 'PoolWorker')
w.daemon = True
w.index = i
w.start() # 此時到了這裡,將要進行 fork。
self._poolctrl[w.pid] = sentinel
self._on_ready_counters[w.pid] = on_ready_counter
if self.on_process_up:
self.on_process_up(w)
return w
具體程式碼如下:
class BaseProcess(object):
'''
Process objects represent activity that is run in a separate process
The class is analagous to `threading.Thread`
'''
def start(self):
'''
Start child process
'''
assert self._popen is None, 'cannot start a process twice'
assert self._parent_pid == os.getpid(), \
'can only start a process object created by current process'
_cleanup()
self._popen = self._Popen(self)
self._sentinel = self._popen.sentinel
_children.add(self)
其中主要是 self._popen = self._Popen(self) 比較重要,我們看下 Popen 的原始碼 _launch:
class ForkProcess(process.BaseProcess):
_start_method = 'fork'
@staticmethod
def _Popen(process_obj):
from .popen_fork import Popen
return Popen(process_obj)
程式碼在:/billiard/popen_fork.py。
看到這裡我們應該明白了。在執行 launch 方法的時候,會使用 os.fork() 派生出一個子程式,並且使用 ps.pipe() 建立出一對讀寫的管道,之後通過比較 [self.pid] 是否為 0,從而在主程式和子程式中執行不同的邏輯:
- 子程式關閉 讀 管道,之後執行 process_obj._bootstrap() 方法。
- 父程式關閉 寫管道,並且記錄讀管道的 fd。
class Popen(object):
method = 'fork'
sentinel = None
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self._launch(process_obj)
def duplicate_for_child(self, fd):
return fd
def poll(self, flag=os.WNOHANG):
if self.returncode is None:
while True:
try:
pid, sts = os.waitpid(self.pid, flag)
except OSError as e:
if e.errno == errno.EINTR:
continue
# Child process not yet created. See #1731717
# e.errno == errno.ECHILD == 10
return None
else:
break
if pid == self.pid:
if os.WIFSIGNALED(sts):
self.returncode = -os.WTERMSIG(sts)
else:
assert os.WIFEXITED(sts)
self.returncode = os.WEXITSTATUS(sts)
return self.returncode
def _launch(self, process_obj):
code = 1
parent_r, child_w = os.pipe()
self.pid = os.fork()
if self.pid == 0:
try:
os.close(parent_r)
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap()
finally:
os._exit(code)
else:
os.close(child_w)
self.sentinel = parent_r
3.4.2 輔助管理 Supervisor
Supervisor 類會定期對執行緒池進行維護,比如是否需要動態縮放。
class Supervisor(PoolThread):
def __init__(self, pool):
self.pool = pool
super(Supervisor, self).__init__()
def body(self):
time.sleep(0.8)
pool = self.pool
try:
# do a burst at startup to verify that we can start
# our pool processes, and in that time we lower
# the max restart frequency.
prev_state = pool.restart_state
pool.restart_state = restart_state(10 * pool._processes, 1)
for _ in range(10):
if self._state == RUN and pool._state == RUN:
pool._maintain_pool()
time.sleep(0.1)
# Keep maintaing workers until the cache gets drained, unless
# the pool is termianted
pool.restart_state = prev_state
while self._state == RUN and pool._state == RUN:
pool._maintain_pool()
time.sleep(0.8)
except RestartFreqExceeded:
pool.close()
pool.join()
raise
3.3.3 給子程式分配任務 ---- TaskHandler
這個類是負責具體業務,即在這裡把任務訊息從父程式傳遞給子程式。
之前建立 TaskHandler 中,重要點就是
-
把
self._taskqueue
傳遞進來,這樣以後就通過這個來傳遞任務訊息,這個_taskqueue
就是簡單的資料結構應用,用來在Celery Consumer worker 和 pool 之間做訊息緩衝。 -
把
self._quick_put
傳遞進來,賦值給了 put,即 put 指向了self._inqueue.put
; -
這樣 TaskHandler 就通過
put(task)
這個來給 父子程式之前的 管道_inqueue
傳送訊息。就是說,TaskHandler 內部,如果 父程式 接到訊息,就 通過self._inqueue.put
這個管道的函式 給 自己 的 子程式發訊息。self._taskqueue
就是一箇中間變數而已。
此時 各種 queue 的來源是:
self._taskqueue = Queue()
def _setup_queues(self):
self._inqueue = Queue()
self._outqueue = Queue()
self._quick_put = self._inqueue.put
self._quick_get = self._outqueue.get
self._task_handler = self.TaskHandler(self._taskqueue,
self._quick_put,
self._outqueue,
self._pool,
self._cache)
所以初始化時候變數為:
outqueue = {SimpleQueue} <billiard.queues.SimpleQueue object at 0x000001B55131AE88>
pool = {list: 8} [<SpawnProcess(SpawnPoolWorker-1, started daemon)>, <SpawnProcess(SpawnPoolWorker-2, started daemon)>, <SpawnProcess(SpawnPoolWorker-3, started daemon)>, <SpawnProcess(SpawnPoolWorker-4, started daemon)>, <SpawnProcess(SpawnPoolWorker-5, started daemon)>, <SpawnProcess(SpawnPoolWorker-6, started daemon)>, <SpawnProcess(SpawnPoolWorker-7, started daemon)>, <SpawnProcess(SpawnPoolWorker-8, started daemon)>]
put = {method} <bound method _ConnectionBase.send of <billiard.connection.PipeConnection object at 0x000001B55131AF08>>
self = {TaskHandler} Unable to get repr for <class 'billiard.pool.TaskHandler'>
taskqueue = {Queue} <queue.Queue object at 0x000001B551334308>
TaskHandler簡略版程式碼如下:
class TaskHandler(PoolThread):
def __init__(self, taskqueue, put, outqueue, pool, cache):
self.taskqueue = taskqueue
self.put = put
self.outqueue = outqueue
self.pool = pool
self.cache = cache
super(TaskHandler, self).__init__()
def body(self):
cache = self.cache
taskqueue = self.taskqueue
put = self.put
for taskseq, set_length in iter(taskqueue.get, None):
task = None
i = -1
try:
for i, task in enumerate(taskseq):
if self._state:
break
put(task)
else:
if set_length:
set_length(i + 1)
continue
break
self.tell_others()
def tell_others(self):
outqueue = self.outqueue
put = self.put
pool = self.pool
try:
# tell result handler to finish when cache is empty
outqueue.put(None)
# tell workers there is no more work
for p in pool:
put(None)
def on_stop_not_started(self):
self.tell_others()
此時邏輯為:
注意:這裡圖中的 Worker scope 是 celery/apps/worker.py,屬於 Celery 之中邏輯範疇,不是子程式相關概念(下面各圖 同)。Celery 中有多個同名類,這點很讓人糾結。
+
Consumer |
message |
v strategy +------------------------------------+
+------------+------+ | strategies |
| on_task_received | <--------+ | |
| | |[myTest.add : task_message_handler] |
+------------+------+ +------------------------------------+
|
|
+------------------------------------------------------------------------------------+
strategy |
|
|
v Request [myTest.add]
+------------+-------------+ +---------------------+
| task_message_handler | <-------------------+ | create_request_cls |
| | | |
+------------+-------------+ +---------------------+
| _process_task_sem
|
+------------------------------------------------------------------------------------+
Worker | req[{Request} myTest.add]
v
+--------+-----------+
| WorkController |
| |
| pool +-------------------------+
+--------+-----------+ |
| |
| apply_async v
+-----------+----------+ +---+-------------------+
|{Request} myTest.add | +---------------> | TaskPool |
+----------------------+ +----+------------------+
myTest.add |
|
+--------------------------------------------------------------------------------------+
|
v
+----+------------------+
| billiard.pool.Pool |
+-------+---------------+
|
|
Pool +---------------------------+ |
| TaskHandler | |
| | | self._taskqueue.put
| _taskqueue | <---------------+
| |
+---------------------------+
手機如下:
3.4.3 處理子程式返回 --- ResultHandler
父程式 使用 ResultHandler 用來處理子程式的執行返回。
def create_result_handler(self):
return super().create_result_handler(
fileno_to_outq=self._fileno_to_outq,
on_process_alive=self.on_process_alive,
)
class ResultHandler(_pool.ResultHandler):
"""Handles messages from the pool processes."""
def __init__(self, *args, **kwargs):
self.fileno_to_outq = kwargs.pop('fileno_to_outq')
self.on_process_alive = kwargs.pop('on_process_alive')
super().__init__(*args, **kwargs)
# add our custom message handler
self.state_handlers[WORKER_UP] = self.on_process_alive
具體變數如下:
ResultHandler = {type} <class 'celery.concurrency.asynpool.ResultHandler'>
daemon = {property} <property object at 0x7f847454d638>
fdel = {NoneType} None
exitcode = {property} <property object at 0x7f8475c9e8b8>
fdel = {NoneType} None
fset = {NoneType} None
ident = {property} <property object at 0x7f847454d4f8>
fdel = {NoneType} None
fset = {NoneType} None
name = {property} <property object at 0x7f847454d598>
fdel = {NoneType} None
_initialized = {bool} False
具體程式碼如下 ,可以看到使用 poll 阻塞等待訊息。
def _process_result(self, timeout=1.0):
poll = self.poll
on_state_change = self.on_state_change
while 1:
try:
ready, task = poll(timeout)
if ready:
on_state_change(task)
if timeout != 0: # blocking
break
else:
break
yield
def handle_event(self, fileno=None, events=None):
if self._state == RUN:
if self._it is None:
self._it = self._process_result(0) # non-blocking
try:
next(self._it)
except (StopIteration, CoroStop):
self._it = None
具體 poll 對應於 _poll_result,就是 self._outqueue._reader.poll(timeout)。
可見其阻塞在 outqueue上,就是子程式的管道外發介面。
def _setup_queues(self):
self._inqueue = self._ctx.SimpleQueue()
self._outqueue = self._ctx.SimpleQueue()
self._quick_put = self._inqueue._writer.send
self._quick_get = self._outqueue._reader.recv
def _poll_result(timeout):
if self._outqueue._reader.poll(timeout):
return True, self._quick_get()
return False, None
self._poll_result = _poll_result
所以此時邏輯如下:
+
Consumer |
message |
v strategy +------------------------------------+
+------------+------+ | strategies |
| on_task_received | <--------+ | |
| | |[myTest.add : task_message_handler] |
+------------+------+ +------------------------------------+
|
|
+------------------------------------------------------------------------------------+
strategy |
|
|
v Request [myTest.add]
+------------+-------------+ +---------------------+
| task_message_handler | <-------------------+ | create_request_cls |
| | | |
+------------+-------------+ +---------------------+
| _process_task_sem
|
+------------------------------------------------------------------------------------+
Worker | req[{Request} myTest.add]
v
+--------+-----------+
| WorkController |
| |
| pool +-------------------------+
+--------+-----------+ |
| |
| apply_async v
+-----------+----------+ +---+-------------------+
|{Request} myTest.add | +---------------> | TaskPool |
+----------------------+ +----+------------------+
myTest.add |
|
+--------------------------------------------------------------------------------------+
|
v
+----+------------------+
| billiard.pool.Pool |
+-------+---------------+
|
|
Pool +---------------------------+ |
| TaskHandler | |
| | | self._taskqueue.put
| _taskqueue | <---------------+
| |
+------------+--------------+
|
| put(task)
|
| +------------------+
| | ResultHandler |
| +------------------+
|
| ^
| |
| |
+--------------------------------------------------------------------------------------+
| |
Sub process | |
v +
self._inqueue self._outqueue
手機如下:
3.5 配置file 到 queue 的關係
最後,根據建立子程式結果,配置file 到 queue 的關係。
可以看出來,這裡配置了outq 和 synq 的關係,即這些 queue 指向哪一個 子程式。
程式碼如下:
class AsynPool(_pool.Pool):
"""AsyncIO Pool (no threads)."""
def __init__(self, processes=None, synack=False,
sched_strategy=None, proc_alive_timeout=None,
*args, **kwargs):
......
super().__init__(processes, *args, **kwargs)
for proc in self._pool:
# create initial mappings, these will be updated
# as processes are recycled, or found lost elsewhere.
self._fileno_to_outq[proc.outqR_fd] = proc
self._fileno_to_synq[proc.synqW_fd] = proc
配置完成 fd 之後,為:
self._fileno_to_outq = {dict: 4}
8 = {ForkProcess} <ForkProcess(ForkPoolWorker-1, started daemon)>
12 = {ForkProcess} <ForkProcess(ForkPoolWorker-2, started daemon)>
16 = {ForkProcess} <ForkProcess(ForkPoolWorker-3, started daemon)>
20 = {ForkProcess} <ForkProcess(ForkPoolWorker-4, started daemon)>
__len__ = {int} 4
self._fileno_to_synq = {dict: 1} {None: <ForkProcess(ForkPoolWorker-4, started daemon)>}
3.6 AsynPool 總體結果
最終 AsynPool 的結果如下,我們可以看到內部各種變數,大家可以對應前文進行理解:
self = {AsynPool} <celery.concurrency.asynpool.AsynPool object at 0x7fe44f664128>
ResultHandler = {type} <class 'celery.concurrency.asynpool.ResultHandler'>
SoftTimeLimitExceeded = {type} <class 'billiard.exceptions.SoftTimeLimitExceeded'>
Supervisor = {type} <class 'billiard.pool.Supervisor'>
TaskHandler = {type} <class 'billiard.pool.TaskHandler'>
TimeoutHandler = {type} <class 'billiard.pool.TimeoutHandler'>
Worker = {type} <class 'celery.concurrency.asynpool.Worker'>
allow_restart = {bool} False
enable_timeouts = {bool} True
lost_worker_timeout = {float} 10.0
max_restarts = {int} 100
on_process_down = {NoneType} None
on_process_up = {NoneType} None
on_timeout_cancel = {NoneType} None
on_timeout_set = {NoneType} None
outbound_buffer = {deque: 0} deque([])
process_sentinels = {list: 4} [25, 27, 29, 31]
putlocks = {bool} False
readers = {dict: 0} {}
restart_state = {restart_state} <billiard.common.restart_state object at 0x7fe44f6644a8>
sched_strategy = {int} 4
soft_timeout = {NoneType} None
synack = {bool} False
threads = {bool} False
timeout = {NoneType} None
timers = {dict: 1} {<bound method Pool.maintain_pool of <celery.concurrency.asynpool.AsynPool object at 0x7fe44f664128>>: 5.0}
write_stats = {Counter: 0} Counter()
_Process = {type} <class 'billiard.context.ForkProcess'>
_active_writers = {set: 0} set()
_active_writes = {set: 0} set()
_all_inqueues = {set: 0} set()
_busy_workers = {set: 0} set()
_cache = {dict: 0} {}
_ctx = {ForkContext} <billiard.context.ForkContext object at 0x7fe44e7ac7f0>
_fileno_to_inq = {dict: 0} {}
_fileno_to_outq = {dict: 4} {8: <ForkProcess(ForkPoolWorker-1, started daemon)>, 12: <ForkProcess(ForkPoolWorker-2, started daemon)>, 16: <ForkProcess(ForkPoolWorker-3, started daemon)>, 20: <ForkProcess(ForkPoolWorker-4, stopped[SIGABRT] daemon)>}
_fileno_to_synq = {dict: 1} {None: <ForkProcess(ForkPoolWorker-4, stopped[SIGABRT] daemon)>}
_initargs = {tuple: 2} (<Celery myTest at 0x7fe44e61cb38>, 'celery@me2koreademini')
_inqueue = {NoneType} None
_max_memory_per_child = {NoneType} None
_maxtasksperchild = {NoneType} None
_on_ready_counters = {dict: 4} {14802: <Synchronized wrapper for c_int(0)>, 14803: <Synchronized wrapper for c_int(0)>, 14804: <Synchronized wrapper for c_int(0)>, 14806: <Synchronized wrapper for c_int(0)>}
_outqueue = {NoneType} None
_poll_result = {NoneType} None
_pool = {list: 4} [<ForkProcess(ForkPoolWorker-1, started daemon)>, <ForkProcess(ForkPoolWorker-2, started daemon)>, <ForkProcess(ForkPoolWorker-3, started daemon)>, <ForkProcess(ForkPoolWorker-4, stopped[SIGABRT] daemon)>]
_poolctrl = {dict: 4} {14802: None, 14803: None, 14804: None, 14806: None}
_proc_alive_timeout = {float} 4.0
_processes = {int} 4
_putlock = {LaxBoundedSemaphore} <LaxBoundedSemaphore at 0x7fe44f54bf98 value:4 waiting:0>
_queues = {dict: 4} {(<billiard.queues._SimpleQueue object at 0x7fe44f664160>, <billiard.queues._SimpleQueue object at 0x7fe44f664240>, None): <ForkProcess(ForkPoolWorker-1, started daemon)>, (<billiard.queues._SimpleQueue object at 0x7fe44f664550>, <billiard.queues._SimpleQu
_quick_get = {NoneType} None
_quick_put = {NoneType} None
_result_handler = {ResultHandler} <ResultHandler(Thread-170, initial daemon)>
_state = {int} 0
_task_handler = {TaskHandler} <TaskHandler(Thread-168, initial daemon)>
_taskqueue = {Queue} <queue.Queue object at 0x7fe44f664978>
_terminate = {Finalize} <Finalize object, callback=_terminate_pool, args=(<queue.Queue object at 0x7fe44f664978>, None, None, [<ForkProcess(ForkPoolWorker-1, started daemon)>, <ForkProcess(ForkPoolWorker-2, started daemon)>, <ForkProcess(ForkPoolWorker-3, started daemon)>, <ForkP
_timeout_handler = {TimeoutHandler} <TimeoutHandler(Thread-169, initial daemon)>
_timeout_handler_mutex = {DummyLock} <kombu.asynchronous.semaphore.DummyLock object at 0x7fe44f6cb7b8>
_timeout_handler_started = {bool} False
_waiting_to_start = {set: 0} set()
_worker_handler = {Supervisor} <Supervisor(Thread-151, initial daemon)>
_wrap_exception = {bool} True
因此,本文最終圖如下,其中 worker 就是子程式工作程式碼,ForkProcess 是子程式抽象(這裡只展示出一個):
+------------------------------+ +----------------+
| Pool(bootsteps.StartStopStep)| +-----------------+ | Connection |
+-------------+--------------+ | _SimpleQueue | | |
| | | | _write |
| | _reader +---------> | _read |
| | | | _send |
1 | instantiate | | | _recv |
| | | | _handle+---> {int} 8 <-+
2 on_start | | | +----------------+ |
| | _poll +---------> _ConnectionBase.poll |
+-------------+ | | | |
| | | | | +----------------+ |
| | v | _writer +---------> | Connection | |
| +---+---+-----------+ | | | | |
| | TaskPool | +-------+---------+ | _handle+----> {int} 7 |
| | | +------+ ^ | | |
| | app +----------> |celery| | +----------------+ ^ |
| | | +------+ | | |
| | | + | |
| | | +--------------------------+ +----> (<_SimpleQueue>, <_SimpleQueue>) | |
| | _pool +----------> | AsynPool | | | |
| | | | | | | |
| +---+---------------+ | _queues +------->-----> (<_SimpleQueue>, <_SimpleQueue>) | |
| ^ | | | | |
| | | _fileno_to_inq | | | |
| | | | +----> (<_SimpleQueue>, <_SimpleQueue>) | |
+-------------+ | _fileno_to_outq +--+ | | |
| | | | | |
| _queues[queues] | | +----> (<_SimpleQueue>, <_SimpleQueue>) | |
| + | | | |
| _pool | | | +----------------------+ | |
| + | | | | | | |
+--------------------------+ | | Worker inq | | |
| | | | | | |
| | | | outq | | |
2.1 append(w) | | | | | | |
| | | | synq | | |
v | | | | | |
+-------------+--+ | | | inqW_fd +-----------------------------------------+ |
| | <-+ | | | |
| ForkProcess | | | outqR_fd +------------------------------------------------+
| | <------+ | |
| | | workloop |
| _target +------------> | |
| | | after_fork |
| | | |
+----------------+ +----------------------+
手機如下: