從零開始構建Web應用-PART 1

noONE發表於2019-03-01

譯者前言

使用Python開發web應用非常方便,有很多成熟的框架,比如Flask,Django等等。而這個系列文章是從零開始構建,從中可以學習HTTP協議以及很多原理知識,這對深入理解Web應用的開發非常有幫助。目前,本系列文章共4篇,這是第一篇的譯文。

我將使用Python從零開始構建一個web應用(以及它的web伺服器),本文是這個系列文章的首篇。為了完成這個系列,唯一的依賴就是Python標準庫,並且我會忽略WSGI標準。

言歸正傳,我們馬上開始!

Web伺服器

首先,我們將編寫一個HTTP伺服器用於執行我們的web應用。但是,我們先要花一點時間瞭解一下HTTP協議的工作原理。

HTTP如何工作

簡單來說,HTTP客戶端通過網路連線HTTP伺服器,並且向它們傳送包含字串資料的請求。伺服器會解析這些請求,並且向客戶端返回一個響應。整個協議以及請求和響應的格式在RFC2616 中詳細的介紹,而我會在本文中通俗地講解一下,所以你無需閱讀整個協議的文件。

請求格式

請求是由一些由\r\n分隔的行來表示,第一行叫做“請求行”。請求行由以下部分組成:HTTP方法,後跟一個空格,再後跟檔案的請求路徑,再後跟一個空格,然後是客戶端指定的HTTP協議的版本,最後是回車\r和換行\n符。

GET /some-path HTTP/1.1\r\n
複製程式碼

請求行之後,可能會有零個或者多個請求頭。每個請求頭都由以下內容組成:一個請求頭名稱,後跟冒號,然後是可選值,最後是\r\n

Host: example.com\r\n
Accept: text/html\r\n
複製程式碼

使用空行來標記請求頭的結束:

\r\n
複製程式碼

最後,請求可能包含一個請求體——一個任意的有效負荷,隨著這個請求發向伺服器。

將上述內容彙總一下,得到一個簡單的GET請求:

GET / HTTP/1.1\r\n
Host: example.com\r\n
Accept: text/html\r\n
\r\n
複製程式碼

以下是一個帶有請求體的POST請求:

POST / HTTP/1.1\r\n
Host: example.com\r\n
Accept: application/json\r\n
Content-type: application/json\r\n
Content-length: 2\r\n
\r\n
{}
複製程式碼

響應格式

響應,和請求類似,也是由一些\r\n分隔的行組成。響應的首行叫做“狀態行”,它包含以下資訊:HTTP協議版本,後跟一個空格,後跟響應狀態碼,後跟一個空格,然後是狀態碼的資訊,最後還是\r\n

HTTP/1.1 200 OK\r\n
複製程式碼

狀態行之後是響應頭,然後是一個空行,再就是可選的響應體:

HTTP/1.1 200 OK\r\n
Content-type: text/html\r\n
Content-length: 15\r\n
\r\n
<h1>Hello!</h1>
複製程式碼

一個簡單的伺服器

根據我們目前對協議的瞭解,讓我們來編寫一個伺服器,該伺服器不管接受什麼請求都返回相同的響應。

我們需要建立一個套接字,將其繫結到一個地址,然後開始監聽連線:

import socket

HOST = "127.0.0.1"
PORT = 9000

# By default, socket.socket creates TCP sockets.
with socket.socket() as server_sock:
    # This tells the kernel to reuse sockets that are in `TIME_WAIT` state.
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

    # This tells the socket what address to bind to.
    server_sock.bind((HOST, PORT))

    # 0 is the number of pending connections the socket may have before
    # new connections are refused.  Since this server is going to process
    # one connection at a time, we want to refuse any additional connections.
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")
複製程式碼

如果你現在就執行程式碼,它將輸出它在監聽127.0.0.1:9000,立馬就結束了。為了能夠處理來的連線,我們需要呼叫套接字的accept方法。這樣做就可以阻塞處理過程直到有一個客戶端連線到我們的伺服器。

with socket.socket() as server_sock:
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_sock.bind((HOST, PORT))
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")

    client_sock, client_addr = server_sock.accept()
    print(f"New connection from {client_addr}.")
複製程式碼

一旦我們有一個套接字連線到客戶端,我們就可以開始和它通訊。使用sendall方法,向客戶端傳送響應:

RESPONSE = b"""\
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 15

<h1>Hello!</h1>""".replace(b"\n", b"\r\n")

with socket.socket() as server_sock:
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_sock.bind((HOST, PORT))
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")

    client_sock, client_addr = server_sock.accept()
    print(f"New connection from {client_addr}.")
    with client_sock:
        client_sock.sendall(RESPONSE)
複製程式碼

此時如果你執行程式碼,然後在瀏覽器裡訪問 http://127.0.0.1:9000 ,你會看到字串 “Hello!” 。不幸的是,伺服器傳送了這個響應後就立即結束了,所以重新整理瀏覽器就會報錯。下面修復這個問題:

RESPONSE = b"""\
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 15

<h1>Hello!</h1>""".replace(b"\n", b"\r\n")

with socket.socket() as server_sock:
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_sock.bind((HOST, PORT))
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")

    while True:
        client_sock, client_addr = server_sock.accept()
        print(f"New connection from {client_addr}.")
        with client_sock:
            client_sock.sendall(RESPONSE)
複製程式碼

此時,我們就擁有了一個web伺服器,它可以執行一個簡單的HTML網頁,一共才25行程式碼。這還不算太遭!

一個檔案伺服器

我們繼續擴充套件這個HTTP伺服器,讓它可以處理硬碟上的檔案。

請求抽象

在修改之前,我們需要能夠讀取並且解析來自客戶端的請求。因為我們已經知道,請求資料是由一系列的行表示,每行由\r\n分隔,讓我們編寫一個生成器函式,它可以讀取套接字中的資料,並且解析出每一行的資料:

import typing


def iter_lines(sock: socket.socket, bufsize: int = 16_384) -> typing.Generator[bytes, None, bytes]:
    """Given a socket, read all the individual CRLF-separated lines
    and yield each one until an empty one is found.  Returns the
    remainder after the empty line.
    """
    buff = b""
    while True:
        data = sock.recv(bufsize)
        if not data:
            return b""

        buff += data
        while True:
            try:
                i = buff.index(b"\r\n")
                line, buff = buff[:i], buff[i + 2:]
                if not line:
                    return buff

                yield line
            except IndexError:
                break
複製程式碼

以上程式碼看上去有點困難,實際上,它只是從套接字中儘可能的讀取資料,將它們放到一個緩衝區裡,不斷得將緩衝到的資料拆分成單獨的行,每次給出一行。一旦它發現一個空行,它就會返回提取到的資料。

使用iter_lines,我們可以開始列印出從客戶端讀取到的請求:

with socket.socket() as server_sock:
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_sock.bind((HOST, PORT))
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")

    while True:
        client_sock, client_addr = server_sock.accept()
        print(f"New connection from {client_addr}.")
        with client_sock:
            for request_line in iter_lines(client_sock):
                print(request_line)

            client_sock.sendall(RESPONSE)
複製程式碼

此時如果你執行程式碼,然後在瀏覽器裡訪問 http://127.0.0.1:9000 ,你會在控制檯裡看到以下內容:

Received connection from ('127.0.0.1', 62086)...
b'GET / HTTP/1.1'
b'Host: localhost:9000'
b'Connection: keep-alive'
b'Cache-Control: max-age=0'
b'Upgrade-Insecure-Requests: 1'
b'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36'
b'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
b'Accept-Encoding: gzip, deflate, br'
b'Accept-Language: en-US,en;q=0.9,ro;q=0.8'
複製程式碼

相當整齊!讓我們抽象出一個Request類:

import typing


class Request(typing.NamedTuple):
    method: str
    path: str
    headers: typing.Mapping[str, str]
複製程式碼

現在,這個請求類只知道請求方法,路徑,請求頭,後續,我們繼續支援查詢字串引數以及讀取請求體。

為了封裝邏輯需要構建一個請求,我們在Request類中增加一個類方法from_socket

class Request(typing.NamedTuple):
    method: str
    path: str
    headers: typing.Mapping[str, str]

    @classmethod
    def from_socket(cls, sock: socket.socket) -> "Request":
        """Read and parse the request from a socket object.

        Raises:
          ValueError: When the request cannot be parsed.
        """
        lines = iter_lines(sock)

        try:
            request_line = next(lines).decode("ascii")
        except StopIteration:
            raise ValueError("Request line missing.")

        try:
            method, path, _ = request_line.split(" ")
        except ValueError:
            raise ValueError(f"Malformed request line {request_line!r}.")

        headers = {}
        for line in lines:
            try:
                name, _, value = line.decode("ascii").partition(":")
                headers[name.lower()] = value.lstrip()
            except ValueError:
                raise ValueError(f"Malformed header line {line!r}.")

        return cls(method=method.upper(), path=path, headers=headers)
複製程式碼

這裡用到了iter_lines函式,剛才我們在讀取請求行時用過它。這裡獲取了請求方法和路徑,然後讀取每一個請求頭並且進行轉換。最終,它構建了一個Request物件並返回了該物件。如果我們把它放到之前的伺服器迴圈裡,會像下面這樣:

with socket.socket() as server_sock:
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_sock.bind((HOST, PORT))
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")

    while True:
        client_sock, client_addr = server_sock.accept()
        print(f"Received connection from {client_addr}...")
        with client_sock:
            request = Request.from_socket(client_sock)
            print(request)
            client_sock.sendall(RESPONSE)
複製程式碼

如果你現在連線到伺服器,你會看到如下資訊:

Request(method='GET', path='/', headers={'host': 'localhost:9000', 'user-agent': 'curl/7.54.0', 'accept': '*/*'})
複製程式碼

因為from_socket在特定的情況下會丟擲一個異常,如果你現在給出一個非法的請求,那麼伺服器就可能會當機。為了模擬這種請求,你可以使用telnet連線到伺服器,然後傳送一些偽造的資料:

> telnet 127.0.0.1 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
Connection closed by foreign host.
複製程式碼

果然,這個伺服器當機了:

Received connection from ('127.0.0.1', 62404)...
Traceback (most recent call last):
  File "server.py", line 53, in parse
    request_line = next(lines).decode("ascii")
ValueError: not enough values to unpack (expected 3, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "server.py", line 82, in <module>
    with client_sock:
  File "server.py", line 55, in parse
    raise ValueError("Request line missing.")
ValueError: Malformed request line 'hello'.
複製程式碼

為了能夠更加優雅地處理這種情況,我們使用try-except包裹起對from_socket的呼叫,然後當遇到有缺陷的請求時,就向客戶端傳送一個“400 Bad Request“響應:

BAD_REQUEST_RESPONSE = b"""\
HTTP/1.1 400 Bad Request
Content-type: text/plain
Content-length: 11

Bad Request""".replace(b"\n", b"\r\n")

with socket.socket() as server_sock:
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_sock.bind((HOST, PORT))
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")

    while True:
        client_sock, client_addr = server_sock.accept()
        print(f"Received connection from {client_addr}...")
        with client_sock:
            try:
                request = Request.from_socket(client_sock)
                print(request)
                client_sock.sendall(RESPONSE)
            except Exception as e:
                print(f"Failed to parse request: {e}")
                client_sock.sendall(BAD_REQUEST_RESPONSE)
複製程式碼

如果我們再去嘗試搞掛伺服器,我們的客戶端會得到一個響應,並且伺服器會繼續正常執行:

~> telnet 127.0.0.1 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
HTTP/1.1 400 Bad Request
Content-type: text/plain
Content-length: 11

Bad RequestConnection closed by foreign host.
複製程式碼

現在我們準備開始實現處理檔案的部分,首先,我們在定義一個預設的”404 Not Found“響應:

NOT_FOUND_RESPONSE = b"""\
HTTP/1.1 404 Not Found
Content-type: text/plain
Content-length: 9

Not Found""".replace(b"\n", b"\r\n")

#...

with socket.socket() as server_sock:
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_sock.bind((HOST, PORT))
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")

    while True:
        client_sock, client_addr = server_sock.accept()
        print(f"Received connection from {client_addr}...")
        with client_sock:
            try:
                request = Request.from_socket(client_sock)
                print(request)
                client_sock.sendall(NOT_FOUND_RESPONSE)
            except Exception as e:
                print(f"Failed to parse request: {e}")
                client_sock.sendall(BAD_REQUEST_RESPONSE)
複製程式碼

此外,再增加一個“405 Method Not Allowed ”響應。我們將會只處理GET請求:

METHOD_NOT_ALLOWED_RESPONSE = b"""\
HTTP/1.1 405 Method Not Allowed
Content-type: text/plain
Content-length: 17

Method Not Allowed""".replace(b"\n", b"\r\n")
複製程式碼

我們來定一個SERVER_ROOT 常量和一個serve_file函式,這個常量用於表示伺服器處理哪裡的檔案。

import mimetypes
import os
import socket
import typing

SERVER_ROOT = os.path.abspath("www")

FILE_RESPONSE_TEMPLATE = """\
HTTP/1.1 200 OK
Content-type: {content_type}
Content-length: {content_length}

""".replace("\n", "\r\n")


def serve_file(sock: socket.socket, path: str) -> None:
    """Given a socket and the relative path to a file (relative to
    SERVER_SOCK), send that file to the socket if it exists.  If the
    file doesn't exist, send a "404 Not Found" response.
    """
    if path == "/":
        path = "/index.html"

    abspath = os.path.normpath(os.path.join(SERVER_ROOT, path.lstrip("/")))
    if not abspath.startswith(SERVER_ROOT):
        sock.sendall(NOT_FOUND_RESPONSE)
        return

    try:
        with open(abspath, "rb") as f:
            stat = os.fstat(f.fileno())
            content_type, encoding = mimetypes.guess_type(abspath)
            if content_type is None:
                content_type = "application/octet-stream"

            if encoding is not None:
                content_type += f"; charset={encoding}"

            response_headers = FILE_RESPONSE_TEMPLATE.format(
                content_type=content_type,
                content_length=stat.st_size,
            ).encode("ascii")

            sock.sendall(response_headers)
            sock.sendfile(f)
    except FileNotFoundError:
        sock.sendall(NOT_FOUND_RESPONSE)
        return
複製程式碼

serve_file獲得客戶端套接字和一個檔案的路徑。然後它嘗試解決真正檔案的路徑,這些檔案位於SERVER_ROOT,對於SERVER_ROO之外的檔案就返回“not found”。然後嘗試開啟檔案,找到它的mime型別和大小(使用os.fstat),接著構造響應頭,然後使用sendfile系統呼叫將檔案寫入套接字。如果在硬碟上找不到檔案,就返回"not found"響應。

如果我們增加serve_file,我們的伺服器迴圈像這個樣子:

with socket.socket() as server_sock:
    server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_sock.bind((HOST, PORT))
    server_sock.listen(0)
    print(f"Listening on {HOST}:{PORT}...")

    while True:
        client_sock, client_addr = server_sock.accept()
        print(f"Received connection from {client_addr}...")
        with client_sock:
            try:
                request = Request.from_socket(client_sock)
                if request.method != "GET":
                    client_sock.sendall(METHOD_NOT_ALLOWED_RESPONSE)
                    continue

                serve_file(client_sock, request.path)
            except Exception as e:
                print(f"Failed to parse request: {e}")
                client_sock.sendall(BAD_REQUEST_RESPONSE)
複製程式碼

如果你增加一個檔案www\index.html,靠著server.py檔案,然後訪問http://localhost:9000 ,你就會看到檔案的內容。

尾聲

這是Part 1。在Part 2中,我們將提取ServerResponse的抽象,以及如何處理多個併發的請求。如果你想獲得完整的原始碼,訪問這裡

原文:WEB APPLICATION FROM SCRATCH, PART I

  • *作者:*Bogdan Popa
  • 譯者:noONE

更多精彩內容,關注公眾號SeniorEngineer:

me

相關文章