一行 Python 實現並行化 -- 日常多執行緒操作的新思路

發表於2015-07-22

Python 在程式並行化方面多少有些聲名狼藉。撇開技術上的問題，例如執行緒的實現和 GIL1，我覺得錯誤的教學指導才是主要問題。常見的經典 Python 多執行緒、多程式教程多顯得偏“重”。而且往往隔靴搔癢，沒有深入探討日常工作中最有用的內容。

傳統的例子

簡單搜尋下“Python 多執行緒教程”，不難發現幾乎所有的教程都給出涉及類和佇列的例子：

#Example.py
'''
Standard Producer/Consumer Threading Pattern
'''

import time 
import threading 
import Queue 

class Consumer(threading.Thread): 
    def __init__(self, queue): 
        threading.Thread.__init__(self)
        self._queue = queue 

    def run(self):
        while True: 
            # queue.get() blocks the current thread until 
            # an item is retrieved. 
            msg = self._queue.get() 
            # Checks if the current message is 
            # the "Poison Pill"
            if isinstance(msg, str) and msg == 'quit':
                # if so, exists the loop
                break
            # "Processes" (or in our case, prints) the queue item   
            print "I'm a thread, and I received %s!!" % msg
        # Always be friendly! 
        print 'Bye byes!'

def Producer():
    # Queue is used to share items between
    # the threads.
    queue = Queue.Queue()

    # Create an instance of the worker
    worker = Consumer(queue)
    # start calls the internal run() method to 
    # kick off the thread
    worker.start() 

    # variable to keep track of when we started
    start_time = time.time() 
    # While under 5 seconds.. 
    while time.time() - start_time &lt; 5: 
        # "Produce" a piece of work and stick it in 
        # the queue for the Consumer to process
        queue.put('something at %s' % time.time())
        # Sleep a bit just to avoid an absurd number of messages
        time.sleep(1)

    # This the "poison pill" method of killing a thread. 
    queue.put('quit')
    # wait for the thread to close down
    worker.join()

if __name__ == '__main__':
    Producer()

#Example.py

'''

Standard Producer/Consumer Threading Pattern

'''

import time

import threading

import Queue

class Consumer(threading.Thread):

def __init__(self, queue):

threading.Thread.__init__(self)

self._queue = queue

def run(self):

while True:

# queue.get() blocks the current thread until

# an item is retrieved.

msg = self._queue.get()

# Checks if the current message is

# the "Poison Pill"

if isinstance(msg, str) and msg == 'quit':

# if so, exists the loop

break

# "Processes" (or in our case, prints) the queue item

print "I'm a thread, and I received %s!!" % msg

# Always be friendly!

print 'Bye byes!'

def Producer():

# Queue is used to share items between

# the threads.

queue = Queue.Queue()

# Create an instance of the worker

worker = Consumer(queue)

# start calls the internal run() method to

# kick off the thread

worker.start()

# variable to keep track of when we started

start_time = time.time()

# While under 5 seconds..

while time.time() - start_time < 5:

# "Produce" a piece of work and stick it in

# the queue for the Consumer to process

queue.put('something at %s' % time.time())

# Sleep a bit just to avoid an absurd number of messages

time.sleep(1)

# This the "poison pill" method of killing a thread.

queue.put('quit')

# wait for the thread to close down

worker.join()

if __name__ == '__main__':

Producer()

哈，看起來有些像 Java 不是嗎？

我並不是說使用生產者/消費者模型處理多執行緒/多程式任務是錯誤的（事實上，這一模型自有其用武之地）。只是，處理日常指令碼任務時我們可以使用更有效率的模型。

問題在於…

首先，你需要一個樣板類；
其次，你需要一個佇列來傳遞物件；
而且，你還需要在通道兩端都構建相應的方法來協助其工作（如果需想要進行雙向通訊或是儲存結果還需要再引入一個佇列）。

worker 越多，問題越多

按照這一思路，你現在需要一個 worker 執行緒的執行緒池。下面是一篇 IBM 經典教程中的例子——在進行網頁檢索時通過多執行緒進行加速。

#Example2.py
'''
A more realistic thread pool example 
'''

import time 
import threading 
import Queue 
import urllib2 

class Consumer(threading.Thread): 
    def __init__(self, queue): 
        threading.Thread.__init__(self)
        self._queue = queue 

    def run(self):
        while True: 
            content = self._queue.get() 
            if isinstance(content, str) and content == 'quit':
                break
            response = urllib2.urlopen(content)
        print 'Bye byes!'

def Producer():
    urls = [
        'http://www.python.org', 'http://www.yahoo.com'
        'http://www.scala.org', 'http://www.google.com'
        # etc.. 
    ]
    queue = Queue.Queue()
    worker_threads = build_worker_pool(queue, 4)
    start_time = time.time()

    # Add the urls to process
    for url in urls: 
        queue.put(url)  
    # Add the poison pillv
    for worker in worker_threads:
        queue.put('quit')
    for worker in worker_threads:
        worker.join()

    print 'Done! Time taken: {}'.format(time.time() - start_time)

def build_worker_pool(queue, size):
    workers = []
    for _ in range(size):
        worker = Consumer(queue)
        worker.start() 
        workers.append(worker)
    return workers

if __name__ == '__main__':
    Producer()

#Example2.py

'''

A more realistic thread pool example

'''

import time

import threading

import Queue

import urllib2

class Consumer(threading.Thread):

def __init__(self, queue):

threading.Thread.__init__(self)

self._queue = queue

def run(self):

while True:

content = self._queue.get()

if isinstance(content, str) and content == 'quit':

break

response = urllib2.urlopen(content)

print 'Bye byes!'

def Producer():

urls = [

'http://www.python.org', 'http://www.yahoo.com'

'http://www.scala.org', 'http://www.google.com'

# etc..

]

queue = Queue.Queue()

worker_threads = build_worker_pool(queue, 4)

start_time = time.time()

# Add the urls to process

for url in urls:

queue.put(url)

# Add the poison pillv

for worker in worker_threads:

queue.put('quit')

for worker in worker_threads:

worker.join()

print 'Done! Time taken: {}'.format(time.time() - start_time)

def build_worker_pool(queue, size):

workers = []

for _ in range(size):

worker = Consumer(queue)

worker.start()

workers.append(worker)

return workers

if __name__ == '__main__':

Producer()

這段程式碼能正確的執行，但仔細看看我們需要做些什麼：構造不同的方法、追蹤一系列的執行緒，還有為了解決惱人的死鎖問題，我們需要進行一系列的 join 操作。這還只是開始……

至此我們回顧了經典的多執行緒教程，多少有些空洞不是嗎？樣板化而且易出錯，這樣事倍功半的風格顯然不那麼適合日常使用，好在我們還有更好的方法。

何不試試 map

map 這一小巧精緻的函式是簡捷實現 Python 程式並行化的關鍵。map 源於 Lisp 這類函數語言程式設計語言。它可以通過一個序列實現兩個函式之間的對映。

urls = ['http://www.yahoo.com', 'http://www.reddit.com']
results = map(urllib2.urlopen, urls)

urls = ['http://www.yahoo.com', 'http://www.reddit.com']

results = map(urllib2.urlopen, urls)

上面的這兩行程式碼將 urls 這一序列中的每個元素作為引數傳遞到 urlopen 方法中，並將所有結果儲存到 results 這一列表中。其結果大致相當於：

results = []
for url in urls: 
    results.append(urllib2.urlopen(url))

results = []

for url in urls:

results.append(urllib2.urlopen(url))

map 函式一手包辦了序列操作、引數傳遞和結果儲存等一系列的操作。

為什麼這很重要呢？這是因為藉助正確的庫，map 可以輕鬆實現並行化操作。

在 Python 中有個兩個庫包含了 map 函式： multiprocessing 和它鮮為人知的子庫 multiprocessing.dummy.

這裡多扯兩句： multiprocessing.dummy？ mltiprocessing 庫的執行緒版克隆？這是蝦米？即便在 multiprocessing 庫的官方文件裡關於這一子庫也只有一句相關描述。而這句描述譯成人話基本就是說:”嘛，有這麼個東西，你知道就成.”相信我，這個庫被嚴重低估了！

dummy 是 multiprocessing 模組的完整克隆，唯一的不同在於 multiprocessing 作用於程式，而 dummy 模組作用於執行緒（因此也包括了 Python 所有常見的多執行緒限制）。
所以替換使用這兩個庫異常容易。你可以針對 IO 密集型任務和 CPU 密集型任務來選擇不同的庫。2

動手嘗試

使用下面的兩行程式碼來引用包含並行化 map 函式的庫：

from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

from multiprocessing import Pool

from multiprocessing.dummy import Pool as ThreadPool

例項化 Pool 物件：

pool = ThreadPool()

1 2	pool = ThreadPool()

這條簡單的語句替代了 example2.py 中 build_worker_pool 函式 7 行程式碼的工作。它生成了一系列的 worker 執行緒並完成初始化工作、將它們儲存在變數中以方便訪問。

Pool 物件有一些引數，這裡我所需要關注的只是它的第一個引數：processes. 這一引數用於設定執行緒池中的執行緒數。其預設值為當前機器 CPU 的核數。

一般來說，執行 CPU 密集型任務時，呼叫越多的核速度就越快。但是當處理網路密集型任務時，事情有有些難以預計了，通過實驗來確定執行緒池的大小才是明智的。

pool = ThreadPool(4) # Sets the pool size to 4

1 2	pool = ThreadPool(4) # Sets the pool size to 4

執行緒數過多時，切換執行緒所消耗的時間甚至會超過實際工作時間。對於不同的工作，通過嘗試來找到執行緒池大小的最優值是個不錯的主意。

建立好 Pool 物件後，並行化的程式便呼之欲出了。我們來看看改寫後的 example2.py

import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

urls = [
    'http://www.python.org', 
    'http://www.python.org/about/',
    'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
    'http://www.python.org/doc/',
    'http://www.python.org/download/',
    'http://www.python.org/getit/',
    'http://www.python.org/community/',
    'https://wiki.python.org/moin/',
    'http://planet.python.org/',
    'https://wiki.python.org/moin/LocalUserGroups',
    'http://www.python.org/psf/',
    'http://docs.python.org/devguide/',
    'http://www.python.org/community/awards/'
    # etc.. 
    ]

# Make the Pool of workers
pool = ThreadPool(4) 
# Open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)
#close the pool and wait for the work to finish 
pool.close() 
pool.join()

import urllib2

from multiprocessing.dummy import Pool as ThreadPool

urls = [

'http://www.python.org',

'http://www.python.org/about/',

'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',

'http://www.python.org/doc/',

'http://www.python.org/download/',

'http://www.python.org/getit/',

'http://www.python.org/community/',

'https://wiki.python.org/moin/',

'http://planet.python.org/',

'https://wiki.python.org/moin/LocalUserGroups',

'http://www.python.org/psf/',

'http://docs.python.org/devguide/',

'http://www.python.org/community/awards/'

# etc..

]

# Make the Pool of workers

pool = ThreadPool(4)

# Open the urls in their own threads

# and return the results

results = pool.map(urllib2.urlopen, urls)

#close the pool and wait for the work to finish

pool.close()

pool.join()

實際起作用的程式碼只有 4 行，其中只有一行是關鍵的。map 函式輕而易舉的取代了前文中超過 40 行的例子。為了更有趣一些，我統計了不同方法、不同執行緒池大小的耗時情況。

# results = [] 
# for url in urls:
#   result = urllib2.urlopen(url)
#   results.append(result)

# # ------- VERSUS ------- # 

# # ------- 4 Pool ------- # 
# pool = ThreadPool(4) 
# results = pool.map(urllib2.urlopen, urls)

# # ------- 8 Pool ------- # 

# pool = ThreadPool(8) 
# results = pool.map(urllib2.urlopen, urls)

# # ------- 13 Pool ------- # 

# pool = ThreadPool(13) 
# results = pool.map(urllib2.urlopen, urls)

結果：

#        Single thread:  14.4 Seconds 
#               4 Pool:   3.1 Seconds
#               8 Pool:   1.4 Seconds
#              13 Pool:   1.3 Seconds

# results = []

# for url in urls:

# result = urllib2.urlopen(url)

# results.append(result)

# # ------- VERSUS ------- #

# # ------- 4 Pool ------- #

# pool = ThreadPool(4)

# results = pool.map(urllib2.urlopen, urls)

# # ------- 8 Pool ------- #

# pool = ThreadPool(8)

# results = pool.map(urllib2.urlopen, urls)

# # ------- 13 Pool ------- #

# pool = ThreadPool(13)

# results = pool.map(urllib2.urlopen, urls)

結果：

# Single thread: 14.4 Seconds

# 4 Pool: 3.1 Seconds

# 8 Pool: 1.4 Seconds

# 13 Pool: 1.3 Seconds

很棒的結果不是嗎？這一結果也說明了為什麼要通過實驗來確定執行緒池的大小。在我的機器上當執行緒池大小大於 9 帶來的收益就十分有限了。

另一個真實的例子

生成上千張圖片的縮圖
這是一個 CPU 密集型的任務，並且十分適合進行並行化。

基礎單程式版本

import os 
import PIL 

from multiprocessing import Pool 
from PIL import Image

SIZE = (75,75)
SAVE_DIRECTORY = 'thumbs'

def get_image_paths(folder):
    return (os.path.join(folder, f) 
            for f in os.listdir(folder) 
            if 'jpeg' in f)

def create_thumbnail(filename): 
    im = Image.open(filename)
    im.thumbnail(SIZE, Image.ANTIALIAS)
    base, fname = os.path.split(filename) 
    save_path = os.path.join(base, SAVE_DIRECTORY, fname)
    im.save(save_path)

if __name__ == '__main__':
    folder = os.path.abspath(
        '11_18_2013_R000_IQM_Big_Sur_Mon__e10d1958e7b766c3e840')
    os.mkdir(os.path.join(folder, SAVE_DIRECTORY))

    images = get_image_paths(folder)

    for image in images:
        create_thumbnail(Image)

import os

import PIL

from multiprocessing import Pool

from PIL import Image

SIZE = (75,75)

SAVE_DIRECTORY = 'thumbs'

def get_image_paths(folder):

return (os.path.join(folder, f)

for f in os.listdir(folder)

if 'jpeg' in f)

def create_thumbnail(filename):

im = Image.open(filename)

im.thumbnail(SIZE, Image.ANTIALIAS)

base, fname = os.path.split(filename)

save_path = os.path.join(base, SAVE_DIRECTORY, fname)

im.save(save_path)

if __name__ == '__main__':

folder = os.path.abspath(

'11_18_2013_R000_IQM_Big_Sur_Mon__e10d1958e7b766c3e840')

os.mkdir(os.path.join(folder, SAVE_DIRECTORY))

images = get_image_paths(folder)

for image in images:

create_thumbnail(Image)

上邊這段程式碼的主要工作就是將遍歷傳入的資料夾中的圖片檔案，一一生成縮圖，並將這些縮圖儲存到特定資料夾中。

這我的機器上，用這一程式處理 6000 張圖片需要花費 27.9 秒。

如果我們使用 map 函式來代替 for 迴圈：

import os 
import PIL 

from multiprocessing import Pool 
from PIL import Image

SIZE = (75,75)
SAVE_DIRECTORY = 'thumbs'

def get_image_paths(folder):
    return (os.path.join(folder, f) 
            for f in os.listdir(folder) 
            if 'jpeg' in f)

def create_thumbnail(filename): 
    im = Image.open(filename)
    im.thumbnail(SIZE, Image.ANTIALIAS)
    base, fname = os.path.split(filename) 
    save_path = os.path.join(base, SAVE_DIRECTORY, fname)
    im.save(save_path)

if __name__ == '__main__':
    folder = os.path.abspath(
        '11_18_2013_R000_IQM_Big_Sur_Mon__e10d1958e7b766c3e840')
    os.mkdir(os.path.join(folder, SAVE_DIRECTORY))

    images = get_image_paths(folder)

    pool = Pool()
    pool.map(creat_thumbnail, images)
    pool.close()
    pool.join()

import os

import PIL

from multiprocessing import Pool

from PIL import Image

SIZE = (75,75)

SAVE_DIRECTORY = 'thumbs'

def get_image_paths(folder):

return (os.path.join(folder, f)

for f in os.listdir(folder)

if 'jpeg' in f)

def create_thumbnail(filename):

im = Image.open(filename)

im.thumbnail(SIZE, Image.ANTIALIAS)

base, fname = os.path.split(filename)

save_path = os.path.join(base, SAVE_DIRECTORY, fname)

im.save(save_path)

if __name__ == '__main__':

folder = os.path.abspath(

'11_18_2013_R000_IQM_Big_Sur_Mon__e10d1958e7b766c3e840')

os.mkdir(os.path.join(folder, SAVE_DIRECTORY))

images = get_image_paths(folder)

pool = Pool()

pool.map(creat_thumbnail, images)

pool.close()

pool.join()

5.6 秒！

雖然只改動了幾行程式碼，我們卻明顯提高了程式的執行速度。在生產環境中，我們可以為 CPU 密集型任務和 IO 密集型任務分別選擇多程式和多執行緒庫來進一步提高執行速度——這也是解決死鎖問題的良方。此外，由於 map 函式並不支援手動執行緒管理，反而使得相關的 debug 工作也變得異常簡單。

到這裡，我們就實現了（基本）通過一行 Python 實現並行化。

用Pthread實現多執行緒操作
2018-03-23
thread執行緒
Java多執行緒並行處理任務的實現
2019-04-20
Java執行緒並行
一行 Python 程式碼實現並行
2019-03-16
Python並行
從偽並行的 Python 多執行緒說起
2019-02-19
並行Python執行緒
多執行緒操作
2024-05-29
執行緒
Python簡單實現多執行緒例子
2024-06-07
Python執行緒
Java多執行緒的實現
2019-01-19
Java執行緒
26、多執行緒與並行
2020-10-17
執行緒並行
如何實現多執行緒
2023-05-19
執行緒
Python並行程式設計(二)：多執行緒鎖機制利用Lock與RLock實現執行緒同步
2019-04-09
Python並行行程程式設計執行緒
Python實現投遞多執行緒任務
2024-06-07
Python執行緒
python執行緒池的實現
2018-10-07
Python執行緒
多執行緒的執行緒狀態及相關操作
2021-12-14
執行緒
【QT】子類化QThread實現多執行緒
2020-11-09
QTthread執行緒
多執行緒實現多工二
2019-09-17
執行緒
多執行緒實現多工一
2019-09-17
執行緒
怎樣用一行 Python 程式碼實現並行
2020-07-10
Python並行
Java多執行緒實現方式
2020-07-15
Java執行緒
【unity】 Loom實現多執行緒
2024-10-23
UnityOOM執行緒
JavaScript如何實現多執行緒？
2022-12-29
JavaScript執行緒
Runnable介面實現多執行緒
2021-01-02
執行緒
多執行緒具體實現
2021-03-06
執行緒
Python 多執行緒多程式
2021-03-26
Python執行緒
python多執行緒中：如何關閉執行緒？
2024-03-13
Python執行緒
Python建立多執行緒任務並獲取每個執行緒返回值
2018-09-29
Python執行緒
5招教你實現多執行緒場景下的執行緒安全！
2021-08-11
執行緒
Java高併發與多執行緒（二）-----執行緒的實現方式
2021-01-18
Java執行緒
面試-實現多執行緒的方式
2018-06-29
面試執行緒
多執行緒伺服器的實現
2021-02-24
執行緒伺服器
Python中的多工:多執行緒
2021-04-27
Python執行緒
Python的多程式和多執行緒
2021-03-28
Python執行緒
多執行緒並行執行，然後彙總結果
2019-01-18
執行緒並行
多執行緒爬蟲實現（上）
2018-05-26
執行緒爬蟲
【連載 02】多執行緒實現
2024-11-28
執行緒
python多執行緒中訊息佇列如何實現？
2021-09-11
Python執行緒佇列
Java之實現多執行緒的方式三：實現Callable介面（結合執行緒池使用）
2018-05-23
Java執行緒
多執行緒和多執行緒同步
2024-08-22
執行緒
【多執行緒高併發程式設計】二實現多執行緒的幾種方式
2020-05-08
執行緒程式設計
【QT】子類化QObject+moveToThread實現多執行緒
2020-11-09
QTObjectthread執行緒

一行 Python 實現並行化 -- 日常多執行緒操作的新思路

傳統的例子

問題在於…

worker 越多，問題越多

何不試試 map

動手嘗試

另一個真實的例子

基礎單程式版本

相關文章