python併發3：使用asyncio編寫伺服器

goodspeed發表於2017-07-01

前幾篇文章

asyncio
上一篇我們介紹了 asyncio 包，以及如何使用非同步程式設計管理網路應用中的高併發。在這一篇，我們主要介紹使用 asyncio 包程式設計的兩個例子。

async/await語法

我們先介紹下 async/await 語法，要不然看完這篇可能會困惑，為什麼之前使用 asyncio.coroutine 裝飾器和 yield from，這裡都是用的 async 和 await？

python併發2：使用asyncio處理併發

async/await 是Python3.5 的新語法，語法如下：

async def read_data(db):
    pass複製程式碼

async 是明確將函式宣告為協程的關鍵字，即使沒有await表示式，函式執行也會返回一個協程物件。
在協程函式內部，可以在某個表示式之前使用 await 關鍵字來暫停協程的執行，以等待某協程完成：

async def read_data(db):
    data = await db.fetch('SELECT ...')複製程式碼

這個程式碼如果使用 asyncio.coroutine 裝飾器語法為：

@asyncio.coroutine
def read_data(db):
    data = yield from db.fetch('SELECT ...')複製程式碼

這兩段程式碼執行的結果是一樣的，也就是說可以把 asyncio.coroutine 替換為 async， yield from 替換為 await。

使用新的語法有什麼好處呢：

使生成器和協程的概念更容易理解，因為語法不同
可以消除由於重構時不小心移出協程中yield 宣告而導致的不明確錯誤，這回導致協程變成普通的生成器。

使用 asyncio 包編寫伺服器

這個例子主要是使用 asyncio 包和 unicodedata 模組，實現通過規範名稱查詢Unicode 字元。

我們先來看一下程式碼：

# charfinder.py
import sys
import re
import unicodedata
import pickle
import warnings
import itertools
import functools
from collections import namedtuple

RE_WORD = re.compile('\w+')
RE_UNICODE_NAME = re.compile('^[A-Z0-9 -]+$')
RE_CODEPOINT = re.compile('U\+[0-9A-F]{4, 6}')

INDEX_NAME = 'charfinder_index.pickle'
MINIMUM_SAVE_LEN = 10000
CJK_UNI_PREFIX = 'CJK UNIFIED IDEOGRAPH'
CJK_CMP_PREFIX = 'CJK COMPATIBILITY IDEOGRAPH'

sample_chars = [
    '$',  # DOLLAR SIGN
    'A',  # LATIN CAPITAL LETTER A
    'a',  # LATIN SMALL LETTER A
    '\u20a0',  # EURO-CURRENCY SIGN
    '\u20ac',  # EURO SIGN
]

CharDescription = namedtuple('CharDescription', 'code_str char name')

QueryResult = namedtuple('QueryResult', 'count items')


def tokenize(text):
    '''
    :param text: 
    :return: return iterable of uppercased words 
    '''
    for match in RE_WORD.finditer(text):
        yield match.group().upper()


def query_type(text):
    text_upper = text.upper()
    if 'U+' in text_upper:
        return 'CODEPOINT'
    elif RE_UNICODE_NAME.match(text_upper):
        return 'NAME'
    else:
        return 'CHARACTERS'


class UnicodeNameIndex:
    # unicode name 索引類

    def __init__(self, chars=None):
        self.load(chars)

    def load(self, chars=None):
        # 載入 unicode name    
        self.index = None
        if chars is None:
            try:
                with open(INDEX_NAME, 'rb') as fp:
                    self.index = pickle.load(fp)
            except OSError:
                pass
        if self.index is None:
            self.build_index(chars)
        if len(self.index) > MINIMUM_SAVE_LEN:
            try:
                self.save()
            except OSError as exc:
                warnings.warn('Could not save {!r}: {}'
                              .format(INDEX_NAME, exc))

    def save(self):
        with open(INDEX_NAME, 'wb') as fp:
            pickle.dump(self.index, fp)

    def build_index(self, chars=None):
        if chars is None:
            chars = (chr(i) for i in range(32, sys.maxunicode))
        index = {}
        for char in chars:
            try:
                name = unicodedata.name(char)
            except ValueError:
                continue
            if name.startswith(CJK_UNI_PREFIX):
                name = CJK_UNI_PREFIX
            elif name.startswith(CJK_CMP_PREFIX):
                name = CJK_CMP_PREFIX

            for word in tokenize(name):
                index.setdefault(word, set()).add(char)

        self.index = index

    def word_rank(self, top=None):
        # (len(self.index[key], key) 是一個生成器，需要用list 轉成列表，要不然下邊排序會報錯
        res = [list((len(self.index[key], key)) for key in self.index)]
        res.sort(key=lambda  item: (-item[0], item[1]))
        if top is not None:
            res = res[:top]
        return res

    def word_report(self, top=None):
        for postings, key in self.word_rank(top):
            print('{:5} {}'.format(postings, key))

    def find_chars(self, query, start=0, stop=None):
        stop = sys.maxsize if stop is None else stop
        result_sets = []
        for word in tokenize(query):
            # tokenize 是query 的生成器 a b 會是 ['a', 'b'] 的生成器
            chars = self.index.get(word)
            if chars is None:
                result_sets = []
                break
            result_sets.append(chars)

        if not result_sets:
            return QueryResult(0, ())

        result = functools.reduce(set.intersection, result_sets)
        result = sorted(result)  # must sort to support start, stop
        result_iter = itertools.islice(result, start, stop)
        return QueryResult(len(result),
                           (char for char in result_iter))

    def describe(self, char):
        code_str = 'U+{:04X}'.format(ord(char))
        name = unicodedata.name(char)
        return CharDescription(code_str, char, name)

    def find_descriptions(self, query, start=0, stop=None):
        for char in self.find_chars(query, start, stop).items:
            yield self.describe(char)

    def get_descriptions(self, chars):
        for char in chars:
            yield self.describe(char)

    def describe_str(self, char):
        return '{:7}\t{}\t{}'.format(*self.describe(char))

    def find_description_strs(self, query, start=0, stop=None):
        for char in self.find_chars(query, start, stop).items:
            yield self.describe_str(char)

    @staticmethod  # not an instance method due to concurrency
    def status(query, counter):
        if counter == 0:
            msg = 'No match'
        elif counter == 1:
            msg = '1 match'
        else:
            msg = '{} matches'.format(counter)
        return '{} for {!r}'.format(msg, query)

def main(*args):
    index = UnicodeNameIndex()
    query = ' '.join(args)
    n = 0
    for n, line in enumerate(index.find_description_strs(query), 1):
        print(line)
    print('({})'.format(index.status(query, n)))


if __name__ == '__main__':
    if len(sys.argv) > 1:
        main(*sys.argv[1:])
    else:
        print('Usage: {} word1 [word2]...'.format(sys.argv[0]))複製程式碼

這個模組讀取Python內建的Unicode資料庫，為每個字元名稱中的每個單詞建立索引，然後倒排索引，存入一個字典。
例如，在倒排索引中，'SUN' 鍵對應的條目是一個集合，裡面是名稱中包含'SUN' 這個詞的10個Unicode字元。倒排索引儲存在本地一個名為charfinder_index.pickle 的檔案中。如果查詢多個單詞，會計算從索引中所得集合的交集。
執行示例如下：

    >>> main('rook')  # doctest: +NORMALIZE_WHITESPACE
    U+2656  ♖  WHITE CHESS ROOK
    U+265C  ♜  BLACK CHESS ROOK
    (2 matches for 'rook')
    >>> main('rook', 'black')  # doctest: +NORMALIZE_WHITESPACE
    U+265C  ♜  BLACK CHESS ROOK
    (1 match for 'rook black')
    >>> main('white bishop')  # doctest: +NORMALIZE_WHITESPACE
    U+2657  ♗   WHITE CHESS BISHOP
    (1 match for 'white bishop')
    >>> main("jabberwocky's vest")
    (No match for "jabberwocky's vest")複製程式碼

這個模組沒有使用併發，主要作用是為使用 asyncio 包編寫的伺服器提供支援。
下面我們來看下 tcp_charfinder.py 指令碼：

# tcp_charfinder.py
import sys
import asyncio

# 用於構建索引，提供查詢方法
from charfinder import UnicodeNameIndex

CRLF = b'\r\n'
PROMPT = b'?> '

# 例項化UnicodeNameIndex 類，它會使用charfinder_index.pickle 檔案
index = UnicodeNameIndex()

async def handle_queries(reader, writer):
    # 這個協程要傳給asyncio.start_server 函式，接收的兩個引數是asyncio.StreamReader 物件和 asyncio.StreamWriter 物件
    while True:  # 這個迴圈處理會話，直到從客戶端收到控制字元後退出
        writer.write(PROMPT)  # can't await!  # 這個方法不是協程，只是普通函式；這一行傳送 ?> 提示符
        await writer.drain()  # must await!  # 這個方法重新整理writer 緩衝；因為它是協程，所以要用 await
        data = await reader.readline()  # 這個方法也是協程，返回一個bytes物件，也要用await
        try:
            query = data.decode().strip()
        except UnicodeDecodeError:
            # Telenet 客戶端傳送控制字元時，可能會丟擲UnicodeDecodeError異常
            # 我們這裡預設傳送空字元
            query = '\x00'
        client = writer.get_extra_info('peername')  # 返回套接字連線的遠端地址
        print('Received from {}: {!r}'.format(client, query))  # 在控制檯列印查詢記錄
        if query:
            if ord(query[:1]) < 32:  # 如果收到控制字元或者空字元，退出迴圈
                break
            # 返回一個生成器，產出包含Unicode 碼位、真正的字元和字元名稱的字串
            lines = list(index.find_description_strs(query)) 
            if lines:
                # 使用預設的UTF-8 編碼把lines    轉換成bytes 物件，並在每一行末新增回車符合換行符
                # 引數列表是一個生成器
                writer.writelines(line.encode() + CRLF for line in lines) 
            writer.write(index.status(query, len(lines)).encode() + CRLF) # 輸出狀態

            await writer.drain()  # 重新整理輸出緩衝
            print('Sent {} results'.format(len(lines)))  # 在伺服器控制檯記錄響應

    print('Close the client socket')  # 在控制檯記錄會話結束
    writer.close()  # 關閉StreamWriter流



def main(address='127.0.0.1', port=2323):  # 新增預設地址和埠，所以呼叫預設可以不加引數
    port = int(port)
    loop = asyncio.get_event_loop()
    # asyncio.start_server 協程執行結束後，
    # 返回的協程物件返回一個asyncio.Server 例項，即一個TCP套接字伺服器
    server_coro = asyncio.start_server(handle_queries, address, port,
                                loop=loop) 
    server = loop.run_until_complete(server_coro) # 驅動server_coro 協程，啟動伺服器

    host = server.sockets[0].getsockname()  # 獲得這個伺服器的第一個套接字的地址和埠
    print('Serving on {}. Hit CTRL-C to stop.'.format(host))  # 在控制檯中顯示地址和埠
    try:
        loop.run_forever()  # 執行事件迴圈 main 函式在這裡阻塞，直到伺服器的控制檯中按CTRL-C 鍵
    except KeyboardInterrupt:  # CTRL+C pressed
        pass

    print('Server shutting down.')
    server.close()
    # server.wait_closed返回一個 future
    # 呼叫loop.run_until_complete 方法，執行 future
    loop.run_until_complete(server.wait_closed())  
    loop.close()  # 終止事件迴圈


if __name__ == '__main__':
    main(*sys.argv[1:])複製程式碼

執行 tcp_charfinders.py

python tcp_charfinders.py複製程式碼

開啟終端，使用 telnet 命令請求服務，執行結果如下所示：

main 函式幾乎會立即顯示 Serving on... 訊息，然後在呼叫loop.run_forever() 方法時阻塞。這時，控制權流動到事件迴圈中，而且一直等待，偶爾會回到handle_queries 協程，這個協程需要等待網路傳送或接收資料時，控制權又交給事件迴圈。

handle_queries 協程可以處理多個客戶端發來的多次請求。只要有新客戶端連線伺服器，就會啟動一個handle_queries 協程例項。

handle_queries 的I/O操作都是使用bytes格式。我們從網路得到的資料要解碼，發出去的資料也要編碼

asyncio包提供了高層的流API，提供了現成的伺服器，我們只需要實現一個處理程式。詳細資訊可以檢視文件：docs.python.org/3/library/a…

雖然，asyncio包提供了伺服器，但是功能相對來說還是比較簡陋的，現在我們使用一下基於asyncio包的 web 框架 sanci，用它來實現一個http版的簡易伺服器

sanic
的簡單入門在上一篇文章有介紹，python web 框架 Sanci 快速入門

使用 sanic 包編寫web 伺服器

Sanic 是一個和類Flask 的基於Python3.5+的web框架，提供了比較高階的API，比如路由、request引數，response等，我們只需要實現處理邏輯即可。

下邊是使用 sanic 實現的簡易的字元查詢http web 服務：

from sanic import Sanic
from sanic import response

from charfinder import UnicodeNameIndex

app = Sanic()

index = UnicodeNameIndex()

html_temp = '<p>{char}</p>'

@app.route('/charfinder')  # app.route 函式的第一個引數是url path，我們這裡指定路徑是charfinder
async def charfinder(request):
    # request.args 可以取到url 的查詢引數
    # ?key1=value1&key2=value2 的結果是 {'key1': ['value1'], 'key2': ['value2']}
    # 我們這裡支援傳入多個查詢引數，所以這裡使用 request.args.getlist('char')
    # 如果我們 使用 request.args.get('char') 只能取到第一個引數
    query = request.args.getlist('char')
    query = ' '.join(query)
    lines = list(index.find_description_strs(query))
    # 將得到的結果生成html
    html = '\n'.join([html_temp.format(char=line) for line in lines])
    return response.html(html)

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8000)  # 設定伺服器執行地址和埠號複製程式碼

對比兩段程式碼可以發現，使用 sanic 非常簡單。

執行服務：

python http_charsfinder.py複製程式碼

我們在瀏覽器輸入地址 http://0.0.0.0:8000/charfinder?char=sun 結果示例如下

https://user-gold-cdn.xitu.io/2017/7/1/28f220bae095ad24a1e70b1d8684eea2

現在對比下兩段程式碼

在TCP 的示例中，伺服器通過main函式下的這兩行程式碼建立並排定執行時間：

server_coro = asyncio.start_server(handle_queries, address, port,
                                loop=loop)
server = loop.run_until_complete(server_coro)複製程式碼

而在sanic的HTTP示例中，使用，建立伺服器：

app.run(host="0.0.0.0", port=8000)複製程式碼

這兩個看起來執行方式完全不同，但如果我們翻開sanic的原始碼會看到 app.run() 內部是呼叫的 server_coroutine = loop.create_server()建立伺服器，
server_coroutine 是通過 loop.run_until_complete()驅動的。

所以說，為了啟動伺服器，這兩個都是由 loop.run_until_complete 驅動，完成執行的。只不過 sanic 封裝了run 方法，使得使用更加方便。

這裡可以得到一個基本事實：只有驅動協程，協程才能做事，而驅動 asyncio.coroutine 裝飾的協程有兩種方式，使用 yield from 或者傳給asyncio 包中某個引數為協程或future的函式，例如 run_until_complete

現在如果你搜尋 cjk，會得到7萬多條資料3M 的一個html檔案，耗時大約2s，這如果是生產服務的一個請求，耗時2s是不能接收的，我們可以使用分頁，這樣我們可以每次只取200條資料，當使用者想看更多資料時再使用 ajax 或者 websockets傳送下一批資料。

這一篇我們使用 asyncio 包實現了TCP伺服器，使用sanic（基於asyncio sanic 預設使用 uvloop替代asyncio）實現了HTTP伺服器，用於按名稱搜尋Unicode 字元。但是並沒有涉及伺服器併發部分，這部分可以以後再討論。

這一篇還是《流暢的python》asyncio 一章的讀書筆記，下一篇將是python併發的第三篇，《使用執行緒處理併發》。

參考連結

最後，感謝女朋友支援。

>歡迎關注	>請我喝芬達
歡迎關注	請我喝芬達

Python學習之路37-使用asyncio包處理併發
2018-08-24
Python
python3 使用 asyncio 代替執行緒
2019-02-16
Python執行緒
Python web伺服器3: 靜態伺服器&併發web伺服器
2018-12-08
PythonWeb伺服器
（二）透過fork編寫一個簡單的併發伺服器
2018-09-23
伺服器
python非同步asyncio模組的使用
2018-03-13
Python非同步
asyncio的基本使用框架，python高效處理資料，asyncio.gather(),asyncio. create_task(),asyncio.run(main())
2023-04-26
框架PythonAI
再次理解asyncio/await syntax and asyncio in Python
2020-12-19
AIPython
從頭造輪子：python3 asyncio之 gather （3）
2022-01-17
Python
【Python】asyncio框架
2020-10-26
Python框架
asyncio非同步模組的21個協程編寫例項
2020-08-25
非同步
python併發4：使用thread處理併發
2017-07-01
Pythonthread
使用Python編寫猜拳小程式
2018-12-18
Python
使用Python編寫MapReduce作業
2013-07-25
Python
理解Python asyncio原理和簡潔使用方式
2019-10-29
Python
Python asyncio 爬蟲
2020-04-28
Python爬蟲
Python協程之asyncio
2020-08-31
Python
如何使用 Flask 編寫 Python Web API
2019-12-21
FlaskPythonWebAPI
Python如何使用tkinter編寫GUI程式
2021-09-11
PythonGUI
如何使用Python編寫vim外掛
2017-12-04
Python
從頭造輪子：python3 asyncio 之 run（2）
2021-12-28
Python
從頭造輪子：python3 asyncio之 sleep （4）
2022-03-10
Python
Asyncio in Python and Concurrency tasks
2024-03-10
Python
使用typescript開發angular模組(編寫模組)
2018-04-23
TypeScriptAngular
愛奇藝網路協程編寫高併發應用實踐
2020-07-15
Golang 編寫 Tcp 伺服器
2019-06-22
GolangTCP伺服器
python中重要的模組--asyncio
2020-08-21
Python
Python學習筆記 - asyncio
2019-01-29
Python筆記
使用Python進行併發程式設計
2015-04-02
Python程式設計
Python 編寫的線上多人多聊天室伺服器
2016-01-19
Python伺服器
Guava併發：使用Monitor控制併發
2021-09-09
Guava
使用redis中setnx防止併發二次寫入
2021-06-21
Redis
如何編寫python模組
2021-09-11
Python
gorm操作sqlite3，高併發讀寫如何避免鎖庫？
2023-03-09
GoORMSQLite
Python 併發模型
2016-01-10
Python模型
使用python的scrapy來編寫一個爬蟲
2019-03-14
Python爬蟲
使用pycharm or vscode來編寫python程式碼？
2022-12-06
PyCharmVSCodePython
如何使用Python編寫一個Lisp直譯器
2013-09-09
PythonLisp
使用PyQt來編寫第一個Python GUI程式
2015-04-17
QTPythonGUI