譯者前言
使用Python開發web應用非常方便,有很多成熟的框架,比如Flask,Django等等。而這個系列文章是從零開始構建,從中可以學習HTTP協議以及很多原理知識,這對深入理解Web應用的開發非常有幫助。目前,本系列文章共4篇,這是第一篇的譯文。
我將使用Python從零開始構建一個web應用(以及它的web伺服器),本文是這個系列文章的首篇。為了完成這個系列,唯一的依賴就是Python標準庫,並且我會忽略WSGI標準。
言歸正傳,我們馬上開始!
Web伺服器
首先,我們將編寫一個HTTP伺服器用於執行我們的web應用。但是,我們先要花一點時間瞭解一下HTTP協議的工作原理。
HTTP如何工作
簡單來說,HTTP客戶端通過網路連線HTTP伺服器,並且向它們傳送包含字串資料的請求。伺服器會解析這些請求,並且向客戶端返回一個響應。整個協議以及請求和響應的格式在RFC2616 中詳細的介紹,而我會在本文中通俗地講解一下,所以你無需閱讀整個協議的文件。
請求格式
請求是由一些由\r\n
分隔的行來表示,第一行叫做“請求行”。請求行由以下部分組成:HTTP方法,後跟一個空格,再後跟檔案的請求路徑,再後跟一個空格,然後是客戶端指定的HTTP協議的版本,最後是回車\r
和換行\n
符。
GET /some-path HTTP/1.1\r\n
複製程式碼
請求行之後,可能會有零個或者多個請求頭。每個請求頭都由以下內容組成:一個請求頭名稱,後跟冒號,然後是可選值,最後是\r\n
:
Host: example.com\r\n
Accept: text/html\r\n
複製程式碼
使用空行來標記請求頭的結束:
\r\n
複製程式碼
最後,請求可能包含一個請求體——一個任意的有效負荷,隨著這個請求發向伺服器。
將上述內容彙總一下,得到一個簡單的GET
請求:
GET / HTTP/1.1\r\n
Host: example.com\r\n
Accept: text/html\r\n
\r\n
複製程式碼
以下是一個帶有請求體的POST
請求:
POST / HTTP/1.1\r\n
Host: example.com\r\n
Accept: application/json\r\n
Content-type: application/json\r\n
Content-length: 2\r\n
\r\n
{}
複製程式碼
響應格式
響應,和請求類似,也是由一些\r\n
分隔的行組成。響應的首行叫做“狀態行”,它包含以下資訊:HTTP協議版本,後跟一個空格,後跟響應狀態碼,後跟一個空格,然後是狀態碼的資訊,最後還是\r\n
:
HTTP/1.1 200 OK\r\n
複製程式碼
狀態行之後是響應頭,然後是一個空行,再就是可選的響應體:
HTTP/1.1 200 OK\r\n
Content-type: text/html\r\n
Content-length: 15\r\n
\r\n
<h1>Hello!</h1>
複製程式碼
一個簡單的伺服器
根據我們目前對協議的瞭解,讓我們來編寫一個伺服器,該伺服器不管接受什麼請求都返回相同的響應。
我們需要建立一個套接字,將其繫結到一個地址,然後開始監聽連線:
import socket
HOST = "127.0.0.1"
PORT = 9000
# By default, socket.socket creates TCP sockets.
with socket.socket() as server_sock:
# This tells the kernel to reuse sockets that are in `TIME_WAIT` state.
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
# This tells the socket what address to bind to.
server_sock.bind((HOST, PORT))
# 0 is the number of pending connections the socket may have before
# new connections are refused. Since this server is going to process
# one connection at a time, we want to refuse any additional connections.
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
複製程式碼
如果你現在就執行程式碼,它將輸出它在監聽127.0.0.1:9000
,立馬就結束了。為了能夠處理來的連線,我們需要呼叫套接字的accept
方法。這樣做就可以阻塞處理過程直到有一個客戶端連線到我們的伺服器。
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
複製程式碼
一旦我們有一個套接字連線到客戶端,我們就可以開始和它通訊。使用sendall
方法,向客戶端傳送響應:
RESPONSE = b"""\
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 15
<h1>Hello!</h1>""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
client_sock.sendall(RESPONSE)
複製程式碼
此時如果你執行程式碼,然後在瀏覽器裡訪問 http://127.0.0.1:9000 ,你會看到字串 “Hello!” 。不幸的是,伺服器傳送了這個響應後就立即結束了,所以重新整理瀏覽器就會報錯。下面修復這個問題:
RESPONSE = b"""\
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 15
<h1>Hello!</h1>""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
client_sock.sendall(RESPONSE)
複製程式碼
此時,我們就擁有了一個web伺服器,它可以執行一個簡單的HTML網頁,一共才25行程式碼。這還不算太遭!
一個檔案伺服器
我們繼續擴充套件這個HTTP伺服器,讓它可以處理硬碟上的檔案。
請求抽象
在修改之前,我們需要能夠讀取並且解析來自客戶端的請求。因為我們已經知道,請求資料是由一系列的行表示,每行由\r\n
分隔,讓我們編寫一個生成器函式,它可以讀取套接字中的資料,並且解析出每一行的資料:
import typing
def iter_lines(sock: socket.socket, bufsize: int = 16_384) -> typing.Generator[bytes, None, bytes]:
"""Given a socket, read all the individual CRLF-separated lines
and yield each one until an empty one is found. Returns the
remainder after the empty line.
"""
buff = b""
while True:
data = sock.recv(bufsize)
if not data:
return b""
buff += data
while True:
try:
i = buff.index(b"\r\n")
line, buff = buff[:i], buff[i + 2:]
if not line:
return buff
yield line
except IndexError:
break
複製程式碼
以上程式碼看上去有點困難,實際上,它只是從套接字中儘可能的讀取資料,將它們放到一個緩衝區裡,不斷得將緩衝到的資料拆分成單獨的行,每次給出一行。一旦它發現一個空行,它就會返回提取到的資料。
使用iter_lines
,我們可以開始列印出從客戶端讀取到的請求:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
for request_line in iter_lines(client_sock):
print(request_line)
client_sock.sendall(RESPONSE)
複製程式碼
此時如果你執行程式碼,然後在瀏覽器裡訪問 http://127.0.0.1:9000 ,你會在控制檯裡看到以下內容:
Received connection from ('127.0.0.1', 62086)...
b'GET / HTTP/1.1'
b'Host: localhost:9000'
b'Connection: keep-alive'
b'Cache-Control: max-age=0'
b'Upgrade-Insecure-Requests: 1'
b'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36'
b'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
b'Accept-Encoding: gzip, deflate, br'
b'Accept-Language: en-US,en;q=0.9,ro;q=0.8'
複製程式碼
相當整齊!讓我們抽象出一個Request
類:
import typing
class Request(typing.NamedTuple):
method: str
path: str
headers: typing.Mapping[str, str]
複製程式碼
現在,這個請求類只知道請求方法,路徑,請求頭,後續,我們繼續支援查詢字串引數以及讀取請求體。
為了封裝邏輯需要構建一個請求,我們在Request類中增加一個類方法from_socket
:
class Request(typing.NamedTuple):
method: str
path: str
headers: typing.Mapping[str, str]
@classmethod
def from_socket(cls, sock: socket.socket) -> "Request":
"""Read and parse the request from a socket object.
Raises:
ValueError: When the request cannot be parsed.
"""
lines = iter_lines(sock)
try:
request_line = next(lines).decode("ascii")
except StopIteration:
raise ValueError("Request line missing.")
try:
method, path, _ = request_line.split(" ")
except ValueError:
raise ValueError(f"Malformed request line {request_line!r}.")
headers = {}
for line in lines:
try:
name, _, value = line.decode("ascii").partition(":")
headers[name.lower()] = value.lstrip()
except ValueError:
raise ValueError(f"Malformed header line {line!r}.")
return cls(method=method.upper(), path=path, headers=headers)
複製程式碼
這裡用到了iter_lines
函式,剛才我們在讀取請求行時用過它。這裡獲取了請求方法和路徑,然後讀取每一個請求頭並且進行轉換。最終,它構建了一個Request
物件並返回了該物件。如果我們把它放到之前的伺服器迴圈裡,會像下面這樣:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(RESPONSE)
複製程式碼
如果你現在連線到伺服器,你會看到如下資訊:
Request(method='GET', path='/', headers={'host': 'localhost:9000', 'user-agent': 'curl/7.54.0', 'accept': '*/*'})
複製程式碼
因為from_socket
在特定的情況下會丟擲一個異常,如果你現在給出一個非法的請求,那麼伺服器就可能會當機。為了模擬這種請求,你可以使用telnet
連線到伺服器,然後傳送一些偽造的資料:
> telnet 127.0.0.1 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
Connection closed by foreign host.
複製程式碼
果然,這個伺服器當機了:
Received connection from ('127.0.0.1', 62404)...
Traceback (most recent call last):
File "server.py", line 53, in parse
request_line = next(lines).decode("ascii")
ValueError: not enough values to unpack (expected 3, got 1)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "server.py", line 82, in <module>
with client_sock:
File "server.py", line 55, in parse
raise ValueError("Request line missing.")
ValueError: Malformed request line 'hello'.
複製程式碼
為了能夠更加優雅地處理這種情況,我們使用try-except
包裹起對from_socket
的呼叫,然後當遇到有缺陷的請求時,就向客戶端傳送一個“400 Bad Request“響應:
BAD_REQUEST_RESPONSE = b"""\
HTTP/1.1 400 Bad Request
Content-type: text/plain
Content-length: 11
Bad Request""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(RESPONSE)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
複製程式碼
如果我們再去嘗試搞掛伺服器,我們的客戶端會得到一個響應,並且伺服器會繼續正常執行:
~> telnet 127.0.0.1 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
HTTP/1.1 400 Bad Request
Content-type: text/plain
Content-length: 11
Bad RequestConnection closed by foreign host.
複製程式碼
現在我們準備開始實現處理檔案的部分,首先,我們在定義一個預設的”404 Not Found“響應:
NOT_FOUND_RESPONSE = b"""\
HTTP/1.1 404 Not Found
Content-type: text/plain
Content-length: 9
Not Found""".replace(b"\n", b"\r\n")
#...
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(NOT_FOUND_RESPONSE)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
複製程式碼
此外,再增加一個“405 Method Not Allowed ”響應。我們將會只處理GET
請求:
METHOD_NOT_ALLOWED_RESPONSE = b"""\
HTTP/1.1 405 Method Not Allowed
Content-type: text/plain
Content-length: 17
Method Not Allowed""".replace(b"\n", b"\r\n")
複製程式碼
我們來定一個SERVER_ROOT
常量和一個serve_file
函式,這個常量用於表示伺服器處理哪裡的檔案。
import mimetypes
import os
import socket
import typing
SERVER_ROOT = os.path.abspath("www")
FILE_RESPONSE_TEMPLATE = """\
HTTP/1.1 200 OK
Content-type: {content_type}
Content-length: {content_length}
""".replace("\n", "\r\n")
def serve_file(sock: socket.socket, path: str) -> None:
"""Given a socket and the relative path to a file (relative to
SERVER_SOCK), send that file to the socket if it exists. If the
file doesn't exist, send a "404 Not Found" response.
"""
if path == "/":
path = "/index.html"
abspath = os.path.normpath(os.path.join(SERVER_ROOT, path.lstrip("/")))
if not abspath.startswith(SERVER_ROOT):
sock.sendall(NOT_FOUND_RESPONSE)
return
try:
with open(abspath, "rb") as f:
stat = os.fstat(f.fileno())
content_type, encoding = mimetypes.guess_type(abspath)
if content_type is None:
content_type = "application/octet-stream"
if encoding is not None:
content_type += f"; charset={encoding}"
response_headers = FILE_RESPONSE_TEMPLATE.format(
content_type=content_type,
content_length=stat.st_size,
).encode("ascii")
sock.sendall(response_headers)
sock.sendfile(f)
except FileNotFoundError:
sock.sendall(NOT_FOUND_RESPONSE)
return
複製程式碼
serve_file
獲得客戶端套接字和一個檔案的路徑。然後它嘗試解決真正檔案的路徑,這些檔案位於SERVER_ROOT
,對於SERVER_ROO
之外的檔案就返回“not found”。然後嘗試開啟檔案,找到它的mime型別和大小(使用os.fstat
),接著構造響應頭,然後使用sendfile
系統呼叫將檔案寫入套接字。如果在硬碟上找不到檔案,就返回"not found"響應。
如果我們增加serve_file
,我們的伺服器迴圈像這個樣子:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
if request.method != "GET":
client_sock.sendall(METHOD_NOT_ALLOWED_RESPONSE)
continue
serve_file(client_sock, request.path)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
複製程式碼
如果你增加一個檔案www\index.html
,靠著server.py
檔案,然後訪問http://localhost:9000 ,你就會看到檔案的內容。
尾聲
這是Part 1。在Part 2中,我們將提取Server
和Response
的抽象,以及如何處理多個併發的請求。如果你想獲得完整的原始碼,訪問這裡。
原文:WEB APPLICATION FROM SCRATCH, PART I
- *作者:*Bogdan Popa
- 譯者:noONE
更多精彩內容,關注公眾號SeniorEngineer: