Python Web開發最難懂的WSGI協議，到底包含哪些內容？

發表於2017-09-30

我想大部分Python開發者最先接觸到的方向是WEB方向（因為總是有開發者希望馬上給自己做個部落格出來，例如我），既然是WEB，免不了接觸到一些WEB框架，例如Django,Flask,Torando等等，在開發過程中，看過一些文件總會介紹生產環境和開發環境伺服器的配置問題，伺服器又設計web伺服器和應用伺服器，總而言之，我們碰到最多的，必定是這個詞 — WSGI。

接下來的文章，會分為以下幾個部分：

1.WSGI介紹
- 1.1什麼是WSGI
- 1.2怎麼實現WSGI
2.由Django框架分析WSGI
3.實際環境使用的wsgi伺服器
4.WSGI伺服器比較

開始

1 WSGI介紹

1.1 什麼是WSGI

首先介紹幾個關於WSGI相關的概念

WSGI：全稱是Web Server Gateway Interface，WSGI不是伺服器，python模組，框架，API或者任何軟體，只是一種規範，描述web server如何與web application通訊的規範。server和application的規範在PEP 3333中有具體描述。要實現WSGI協議，必須同時實現web server和web application，當前執行在WSGI協議之上的web框架有Torando,Flask,Django

uwsgi：與WSGI一樣是一種通訊協議，是uWSGI伺服器的獨佔協議，用於定義傳輸資訊的型別(type of information)，每一個uwsgi packet前4byte為傳輸資訊型別的描述，與WSGI協議是兩種東西，據說該協議是fcgi協議的10倍快。

uWSGI：是一個web伺服器，實現了WSGI協議、uwsgi協議、http協議等。

WSGI協議主要包括server和application兩部分：

WSGI server負責從客戶端接收請求，將request轉發給application，將application返回的response返回給客戶端；
WSGI application接收由server轉發的request，處理請求，並將處理結果返回給server。application中可以包括多個棧式的中介軟體(middlewares)，這些中介軟體需要同時實現server與application，因此可以在WSGI伺服器與WSGI應用之間起調節作用：對伺服器來說，中介軟體扮演應用程式，對應用程式來說，中介軟體扮演伺服器。

1 2	WSGI server負責從客戶端接收請求，將request轉發給application，將application返回的response返回給客戶端； WSGI application接收由server轉發的request，處理請求，並將處理結果返回給server。application中可以包括多個棧式的中介軟體(middlewares)，這些中介軟體需要同時實現server與application，因此可以在WSGI伺服器與WSGI應用之間起調節作用：對伺服器來說，中介軟體扮演應用程式，對應用程式來說，中介軟體扮演伺服器。

WSGI協議其實是定義了一種server與application解耦的規範，即可以有多個實現WSGI server的伺服器，也可以有多個實現WSGI application的框架，那麼就可以選擇任意的server和application組合實現自己的web應用。例如uWSGI和Gunicorn都是實現了WSGI server協議的伺服器，Django，Flask是實現了WSGI application協議的web框架，可以根據專案實際情況搭配使用。

以上介紹了相關的常識，接下來我們來看看如何簡單實現WSGI協議。

1.2 怎麼實現WSGI

上文說過，實現WSGI協議必須要有wsgi server和application，因此，我們就來實現這兩個東西。

我們來看看官方WSGI使用WSGI的wsgiref模組實現的小demo

有關於wsgiref的快速入門可以看看這篇部落格

def demo_app(environ,start_response):  
    from StringIO import StringIO  
    stdout = StringIO()  
    print >>stdout, "Hello world!"  
    print >>stdout  
    h = environ.items(); h.sort()  
    for k,v in h:  
        print >>stdout, k,'=', repr(v)  
    start_response("200 OK", [('Content-Type','text/plain')])  
    return [stdout.getvalue()]  
  
httpd = make_server('localhost', 8002,  demo_app)  
httpd.serve_forever()  # 使用select

def demo_app(environ,start_response):

from StringIO import StringIO

stdout = StringIO()

print >>stdout, "Hello world!"

print >>stdout

h = environ.items(); h.sort()

for k,v in h:

print >>stdout, k,'=', repr(v)

start_response("200 OK", [('Content-Type','text/plain')])

return [stdout.getvalue()]

httpd = make_server('localhost', 8002, demo_app)

httpd.serve_forever() # 使用select

實現了一個application，來獲取客戶端的環境和回撥函式兩個引數，以及httpd服務端的實現，我們來看看make_server的原始碼

def make_server(  
    host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler  
):  
  """Create a new WSGI server listening on `host` and `port` for `app`"""  
  server = server_class((host, port), handler_class)  
  server.set_app(app)  
  return server

def make_server(

host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler

"""Create a new WSGI server listening on `host` and `port` for `app`"""

server = server_class((host, port), handler_class)

server.set_app(app)

return server

接受一系列函式，返回一個server物件,實現還是比較簡單，下面我們來看看在django中如何實現其自身的wsgi伺服器的。

下面我們自己來實現一遍：

WSGI 規定每個 python 程式（Application）必須是一個可呼叫的物件（實現了__call__ 函式的方法或者類），接受兩個引數 environ（WSGI 的環境資訊）和 start_response（開始響應請求的函式），並且返回 iterable。幾點說明：

environ 和 start_response 由 http server 提供並實現
environ 變數是包含了環境資訊的字典
Application 內部在返回前呼叫 start_response
start_response也是一個 callable，接受兩個必須的引數，status（HTTP狀態）和 response_headers（響應訊息的頭）
可呼叫物件要返回一個值，這個值是可迭代的。

environ 和 start_response 由 http server 提供並實現

environ 變數是包含了環境資訊的字典

Application 內部在返回前呼叫 start_response

start_response也是一個 callable，接受兩個必須的引數，status（HTTP狀態）和 response_headers（響應訊息的頭）

可呼叫物件要返回一個值，這個值是可迭代的。

 # 1. 可呼叫物件是一個函式
def application(environ, start_response):
 
   response_body = 'The request method was %s' % environ['REQUEST_METHOD']
 
   # HTTP response code and message
   status = '200 OK'
 
   # 應答的頭部是一個列表，每對鍵值都必須是一個 tuple。
   response_headers = [('Content-Type', 'text/plain'),
                       ('Content-Length', str(len(response_body)))]
 
   # 呼叫伺服器程式提供的 start_response，填入兩個引數
   start_response(status, response_headers)
 
   # 返回必須是 iterable
   return [response_body]    
   
# 2. 可呼叫物件是一個類
class AppClass:
    """這裡的可呼叫物件就是 AppClass 這個類，呼叫它就能生成可以迭代的結果。
        使用方法類似於： 
        for result in AppClass(env, start_response):
             do_somthing(result)
    """
 
    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response
 
    def __iter__(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Hello world!\n"
 
# 3. 可呼叫物件是一個例項 
class AppClass:
    """這裡的可呼叫物件就是 AppClass 的例項，使用方法類似於： 
        app = AppClass()
        for result in app(environ, start_response):
             do_somthing(result)
    """
 
    def __init__(self):
        pass
 
    def __call__(self, environ, start_response):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Hello world!\n"

# 1. 可呼叫物件是一個函式

def application(environ, start_response):

response_body = 'The request method was %s' % environ['REQUEST_METHOD']

# HTTP response code and message

status = '200 OK'

# 應答的頭部是一個列表，每對鍵值都必須是一個 tuple。

response_headers = [('Content-Type', 'text/plain'),

('Content-Length', str(len(response_body)))]

# 呼叫伺服器程式提供的 start_response，填入兩個引數

start_response(status, response_headers)

# 返回必須是 iterable

return [response_body]

# 2. 可呼叫物件是一個類

class AppClass:

"""這裡的可呼叫物件就是 AppClass 這個類，呼叫它就能生成可以迭代的結果。

使用方法類似於：

for result in AppClass(env, start_response):

do_somthing(result)

"""

def __init__(self, environ, start_response):

self.environ = environ

self.start = start_response

def __iter__(self):

status = '200 OK'

response_headers = [('Content-type', 'text/plain')]

self.start(status, response_headers)

yield "Hello world!\n"

# 3. 可呼叫物件是一個例項

class AppClass:

"""這裡的可呼叫物件就是 AppClass 的例項，使用方法類似於：

app = AppClass()

for result in app(environ, start_response):

do_somthing(result)

"""

def __init__(self):

pass

def __call__(self, environ, start_response):

status = '200 OK'

response_headers = [('Content-type', 'text/plain')]

self.start(status, response_headers)

yield "Hello world!\n"

伺服器程式端

上面已經說過，標準要能夠確切地實行，必須要求程式端和伺服器端共同遵守。上面提到， envrion 和 start_response 都是伺服器端提供的。下面就看看，伺服器端要履行的義務。

準備 environ 引數
定義 start_response 函式
呼叫程式端的可呼叫物件

準備 environ 引數

定義 start_response 函式

呼叫程式端的可呼叫物件

import os, sys
 
def run_with_cgi(application):    # application 是程式端的可呼叫物件
    # 準備 environ 引數，這是一個字典，裡面的內容是一次 HTTP 請求的環境變數
    environ = dict(os.environ.items())
    environ['wsgi.input']        = sys.stdin
    environ['wsgi.errors']       = sys.stderr
    environ['wsgi.version']      = (1, 0)
    environ['wsgi.multithread']  = False
    environ['wsgi.multiprocess'] = True
    environ['wsgi.run_once']     = True            
    environ['wsgi.url_scheme'] = 'http'
 
    headers_set = []
    headers_sent = []
 
    # 把應答的結果輸出到終端
    def write(data):
        sys.stdout.write(data)
        sys.stdout.flush()
 
    # 實現 start_response 函式，根據程式端傳過來的 status 和 response_headers 引數，
    # 設定狀態和頭部
    def start_response(status, response_headers, exc_info=None):
        headers_set[:] = [status, response_headers]
          return write
 
    # 呼叫客戶端的可呼叫物件，把準備好的引數傳遞過去
    result = application(environ, start_response)
    
    # 處理得到的結果，這裡簡單地把結果輸出到標準輸出。
    try:
        for data in result:
            if data:    # don't send headers until body appears
                write(data)
    finally:
        if hasattr(result, 'close'):
            result.close()

import os, sys

def run_with_cgi(application): # application 是程式端的可呼叫物件

# 準備 environ 引數，這是一個字典，裡面的內容是一次 HTTP 請求的環境變數

environ = dict(os.environ.items())

environ['wsgi.input'] = sys.stdin

environ['wsgi.errors'] = sys.stderr

environ['wsgi.version'] = (1, 0)

environ['wsgi.multithread'] = False

environ['wsgi.multiprocess'] = True

environ['wsgi.run_once'] = True

environ['wsgi.url_scheme'] = 'http'

headers_set = []

headers_sent = []

# 把應答的結果輸出到終端

def write(data):

sys.stdout.write(data)

sys.stdout.flush()

# 實現 start_response 函式，根據程式端傳過來的 status 和 response_headers 引數，

# 設定狀態和頭部

def start_response(status, response_headers, exc_info=None):

headers_set[:] = [status, response_headers]

return write

# 呼叫客戶端的可呼叫物件，把準備好的引數傳遞過去

result = application(environ, start_response)

# 處理得到的結果，這裡簡單地把結果輸出到標準輸出。

try:

for data in result:

if data: # don't send headers until body appears

write(data)

finally:

if hasattr(result, 'close'):

result.close()

2 由Django框架分析WSGI

下面我們以django為例，分析一下wsgi的整個流程

django WSGI application

WSGI application應該實現為一個可呼叫iter物件，例如函式、方法、類(包含**call**方法)。需要接收兩個引數：一個字典，該字典可以包含了客戶端請求的資訊以及其他資訊，可以認為是請求上下文，一般叫做environment（編碼中多簡寫為environ、env），一個用於傳送HTTP響應狀態（HTTP status）、響應頭（HTTP headers）的回撥函式,也就是start_response()。通過回撥函式將響應狀態和響應頭返回給server，同時返回響應正文(response body)，響應正文是可迭代的、幷包含了多個字串。

下面是Django中application的具體實現部分：

class WSGIHandler(base.BaseHandler): 
   initLock = Lock() 
   request_class = WSGIRequest 
   def __call__(self, environ, start_response): 
   # 載入中介軟體 
    if self._request_middleware is None: 
         with self.initLock: 
             try: # Check that middleware is still uninitialized. 
                 if self._request_middleware is None: 
                    self.load_middleware() 
             except: # Unload whatever middleware we got 
                    self._request_middleware = None raise          
     set_script_prefix(get_script_name(environ)) # 請求處理之前傳送訊號   
     signals.request_started.send(sender=self.__class__, environ=environ) 
     try: 
          request = self.request_class(environ)  
     except UnicodeDecodeError: 
           logger.warning('Bad Request (UnicodeDecodeError)',exc_info=sys.exc_info(), extra={'status_code': 400,}
           response = http.HttpResponseBadRequest() 
     else: 
           response = self.get_response(request) 
     response._handler_class = self.__class__ status = '%s %s' % (response.status_code, response.reason_phrase) 
     response_headers = [(str(k), str(v)) for k, v in response.items()] for c in response.cookies.values(): response_headers.append((str('Set-Cookie'), str(c.output(header='')))) 
     # server提供的回撥方法，將響應的header和status返回給server     
     start_response(force_str(status), response_headers) 
     if getattr(response, 'file_to_stream', None) is not None and environ.get('wsgi.file_wrapper'): 
          response = environ['wsgi.file_wrapper'](response.file_to_stream) 
     return response

class WSGIHandler(base.BaseHandler):

initLock = Lock()

request_class = WSGIRequest

def __call__(self, environ, start_response):

# 載入中介軟體

if self._request_middleware is None:

with self.initLock:

try: # Check that middleware is still uninitialized.

if self._request_middleware is None:

self.load_middleware()

except: # Unload whatever middleware we got

self._request_middleware = None raise

set_script_prefix(get_script_name(environ)) # 請求處理之前傳送訊號

signals.request_started.send(sender=self.__class__, environ=environ)

try:

request = self.request_class(environ)

except UnicodeDecodeError:

logger.warning('Bad Request (UnicodeDecodeError)',exc_info=sys.exc_info(), extra={'status_code': 400,}

response = http.HttpResponseBadRequest()

else:

response = self.get_response(request)

response._handler_class = self.__class__ status = '%s %s' % (response.status_code, response.reason_phrase)

response_headers = [(str(k), str(v)) for k, v in response.items()] for c in response.cookies.values(): response_headers.append((str('Set-Cookie'), str(c.output(header=''))))

# server提供的回撥方法，將響應的header和status返回給server

start_response(force_str(status), response_headers)

if getattr(response, 'file_to_stream', None) is not None and environ.get('wsgi.file_wrapper'):

response = environ['wsgi.file_wrapper'](response.file_to_stream)

return response

可以看出application的流程包括:載入所有中介軟體，以及執行框架相關的操作，設定當前執行緒指令碼字首，傳送請求開始訊號；處理請求，呼叫get_response()方法處理當前請求，該方法的的主要邏輯是通過urlconf找到對應的view和callback，按順序執行各種middleware和callback。呼叫由server傳入的start_response()方法將響應header與status返回給server。返回響應正文

django WSGI Server

負責獲取http請求，將請求傳遞給WSGI application，由application處理請求後返回response。以Django內建server為例看一下具體實現。通過runserver執行django
專案，在啟動時都會呼叫下面的run方法，建立一個WSGIServer的例項，之後再呼叫其serve_forever()方法啟動服務。

def run(addr, port, wsgi_handler, ipv6=False, threading=False): 
   server_address = (addr, port) 
   if threading: 
        httpd_cls = type(str('WSGIServer'), (socketserver.ThreadingMixIn, WSGIServer), {}) 
   else: 
        httpd_cls = WSGIServer # 這裡的wsgi_handler就是WSGIApplication 
   httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6) 
    if threading: 
        httpd.daemon_threads = True httpd.set_app(wsgi_handler)    
     httpd.serve_forever()

def run(addr, port, wsgi_handler, ipv6=False, threading=False):

server_address = (addr, port)

if threading:

httpd_cls = type(str('WSGIServer'), (socketserver.ThreadingMixIn, WSGIServer), {})

else:

httpd_cls = WSGIServer # 這裡的wsgi_handler就是WSGIApplication

httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6)

if threading:

httpd.daemon_threads = True httpd.set_app(wsgi_handler)

httpd.serve_forever()

下面表示WSGI server伺服器處理流程中關鍵的類和方法。

WSGIServerrun()方法會建立WSGIServer例項，主要作用是接收客戶端請求，將請求傳遞給application，然後將application返回的response返回給客戶端。

建立例項時會指定HTTP請求的handler：WSGIRequestHandler類，通過set_app和get_app方法設定和獲取WSGIApplication例項wsgi_handler。

處理http請求時，呼叫handler_request方法，會建立WSGIRequestHandler，例項處理http請求。WSGIServer中get_request方法通過socket接受請求資料。

WSGIRequestHandler由WSGIServer在呼叫handle_request時建立例項，傳入request、cient_address、WSGIServer三個引數，__init__方法在例項化同時還會呼叫自身的handle方法handle方法會建立ServerHandler例項，然後呼叫其run方法處理請求

ServerHandlerWSGIRequestHandler在其handle方法中呼叫run方法，傳入self.server.get_app()引數，獲取WSGIApplication，然後呼叫例項(__call__)，獲取response，其中會傳入start_response回撥，用來處理返回的header和status。通過application獲取response以後，通過finish_response返回response

WSGIHandlerWSGI協議中的application，接收兩個引數，environ字典包含了客戶端請求的資訊以及其他資訊，可以認為是請求上下文，start_response用於傳送返回status和header的回撥函式

雖然上面一個WSGI server涉及到多個類實現以及相互引用，但其實原理還是呼叫WSGIHandler，傳入請求引數以及回撥方法start_response()，並將響應返回給客戶端。

3 實際環境使用的wsgi伺服器

因為每個web框架都不是專注於實現伺服器方面的，因此，在生產環境部署的時候使用的伺服器也不會簡單的使用web框架自帶的伺服器，這裡，我們來討論一下用於生產環境的伺服器有哪些？

1.gunicorn

Gunicorn（從Ruby下面的Unicorn得到的啟發）應運而生：依賴Nginx的代理行為，同Nginx進行功能上的分離。由於不需要直接處理使用者來的請求（都被Nginx先處理），Gunicorn不需要完成相關的功能，其內部邏輯非常簡單：接受從Nginx來的動態請求，處理完之後返回給Nginx，由後者返回給使用者。

由於功能定位很明確，Gunicorn得以用純Python開發：大大縮短了開發時間的同時，效能上也不會很掉鏈子。同時，它也可以配合Nginx的代理之外的別的Proxy模組工作，其配置也相應比較簡單。

配置上的簡單，大概是它流行的最大的原因。

2.uwsgi

因為使用C語言開發，會和底層接觸的更好，配置也是比較方便，目前和gunicorn兩個算是部署時的唯二之選。

以下是通常的配置檔案

[uwsgi]
http = $(HOSTNAME):9033
http-keepalive = 1
pythonpath = ../
module = service
master = 1
processes = 8
daemonize = logs/uwsgi.log
disable-logging = 1
buffer-size = 16384
harakiri = 5
pidfile = uwsgi.pid
stats = $(HOSTNAME):1733


執行：uwsgi --ini   conf.ini

[uwsgi]

http = $(HOSTNAME):9033

http-keepalive = 1

pythonpath = ../

module = service

master = 1

processes = 8

daemonize = logs/uwsgi.log

disable-logging = 1

buffer-size = 16384

harakiri = 5

pidfile = uwsgi.pid

stats = $(HOSTNAME):1733

執行：uwsgi --ini conf.ini

3.fcgi

不多數，估計使用的人也是比較少，這裡只是提一下

4.bjoern

Python WSGI界最牛逼效能的Server其中一個是bjoern，純C，小於1000行程式碼，就是看不慣uWSGI的冗餘自寫的。

4 WSGI伺服器比較

綜合廣大Python開發者的實際經歷，我們可以得出，使用最廣的當屬uWSGI以及gunicorn，我們這裡來比較比較兩者與其他伺服器的區別。
1.gunicorn本身是個多程式管理器，需要指定相關的不同型別的worker去工作，使用gevent作為worker時單機大概是3000RPS Hello World，勝過torando自帶的伺服器大概是2000左右，uWSGI則會更高一點。
2.相比於tornado對於現有程式碼需要大規模重構才能用上高階特性，Gevent只需要一個monkey，容易對程式碼進行快速加工。
3.gunicorn 可以做 pre hook and post hook.

下面來對比以下uWSGI和gunicorn的速度差比