客戶端禁用Keep-alive, 服務端開啟Keep-alive,會怎麼樣?

部落格猿馬甲哥發表於2022-02-08

最近部署的web程式,在伺服器上出現不少time_wait的連線狀態,會佔用tcp埠,費了幾天時間排查。

之前我有結論:HTTP keep-alive 是在應用層對TCP連線的滑動續約複用,如果客戶端、伺服器穩定續約,就成了名副其實的長連線。

目前所有的HTTP網路庫(不論是客戶端、服務端)都預設開啟了HTTP Keep-Alive,通過Request/Response的Connection標頭來協商複用連線。

非常規做法導致的短連線

我手上有個專案,由於歷史原因,客戶端禁用了Keep-Alive,服務端預設開啟了Keep-Alive,如此一來協商複用連線失敗, 客戶端每次請求會使用新的TCP連線, 也就是回退為短連線。

客戶端禁用Keep-alive, 服務端開啟Keep-alive,會怎麼樣?

客戶端強制禁用Keep-Alive

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "time"
)

func main() {
    tr := http.Transport{
        DisableKeepAlives: true,
    }
    client := &http.Client{
        Timeout:   10 * time.Second,
        Transport: &tr,
    }
    for {
        requestWithClose(client)
        time.Sleep(time.Second * 1)
    }
}

func requestWithClose(client *http.Client) {
    resp, err := client.Get("http://10.100.219.9:8081")
    if err != nil {
        fmt.Printf("error occurred while fetching page, error: %s", err.Error())
        return
    }
    defer resp.Body.Close()
    c, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        log.Fatalf("Couldn't parse response body. %+v", err)
    }

    fmt.Println(string(c))
}

web服務端預設開啟Keep-Alive

package main

import (
    "fmt"
    "log"
    "net/http"
)

// 根據RemoteAddr 知道客戶端使用的持久連線
func IndexHandler(w http.ResponseWriter, r *http.Request) {
    fmt.Println("receive a request from:", r.RemoteAddr, r.Header)
    w.Write([]byte("ok"))
}

func main() {
    fmt.Printf("Starting server at port 8081\n")
    // net/http 預設開啟持久連線
    if err := http.ListenAndServe(":8081", http.HandlerFunc(IndexHandler)); err != nil {
        log.Fatal(err)
    }
}

從服務端的日誌看,確實是短連線。

receive a request from: 10.22.38.48:54722 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54724 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54726 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54728 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54731 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54733 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54734 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54738 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54740 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54741 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54743 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54744 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]
receive a request from: 10.22.38.48:54746 map[Accept-Encoding:[gzip] Connection:[close] User-Agent:[Go-http-client/1.1]]

誰是主動斷開方?

我想當然的以為 客戶端是主動斷開方,被現實啪啪打臉。

某一天伺服器上超過300的time_wait報警告訴我這tmd是伺服器主動終斷連線。

常規的TCP4次揮手, 主動斷開方會進入time_wait狀態,等待2MSL後釋放佔用的SOCKET

客戶端禁用Keep-alive, 服務端開啟Keep-alive,會怎麼樣?

以下是從伺服器上tcpdump抓取的tcp連線資訊。

客戶端禁用Keep-alive, 服務端開啟Keep-alive,會怎麼樣?

紅框2,3部分明確提示是從 Server端發起TCP的FIN訊息, 之後Client回應ACK確認收到Server的關閉通知; 之後Client再發FIN訊息,告知現在可以關閉了, Server端最終發ACK確認收到,並進入Time_WAIT狀態,等待2MSL的時間關閉Socket。

特意指出,紅框1表示TCP雙端同時關閉,此時會在Client,Server同時留下time_wait痕跡,發生概率較小。

沒有原始碼說個串串

此種情況是服務端主動關閉,我們往回翻一翻golang httpServer的原始碼

  • http.ListenAndServe(":8081")
  • server.ListenAndServe()
  • srv.Serve(ln)
  • go c.serve(connCtx) 使用go協程來處理每個請求

伺服器連線處理請求的簡略原始碼如下:

func (c *conn) serve(ctx context.Context) {
    c.remoteAddr = c.rwc.RemoteAddr().String()
    ctx = context.WithValue(ctx, LocalAddrContextKey, c.rwc.LocalAddr())
    defer func() {
    if !c.hijacked() {
            c.close()
            c.setState(c.rwc, StateClosed, runHooks)
        }
    }()

  ......
    // HTTP/1.x from here on.

    ctx, cancelCtx := context.WithCancel(ctx)
    c.cancelCtx = cancelCtx
    defer cancelCtx()

    c.r = &connReader{conn: c}
    c.bufr = newBufioReader(c.r)
    c.bufw = newBufioWriterSize(checkConnErrorWriter{c}, 4<<10)

    for {
        w, err := c.readRequest(ctx)
        switch {
            case err == errTooLarge:
                const publicErr = "431 Request Header Fields Too Large"
                fmt.Fprintf(c.rwc, "HTTP/1.1 "+publicErr+errorHeaders+publicErr)
                c.closeWriteAndWait()
                return

            case isUnsupportedTEError(err):
                code := StatusNotImplemented
                fmt.Fprintf(c.rwc, "HTTP/1.1 %d %s%sUnsupported transfer encoding", code, StatusText(code), errorHeaders)
                return

            case isCommonNetReadError(err):
                return // don't reply

            default:
                if v, ok := err.(statusError); ok {
                    fmt.Fprintf(c.rwc, "HTTP/1.1 %d %s: %s%s%d %s: %s", v.code, StatusText(v.code), v.text, errorHeaders, v.code, StatusText(v.code), v.text)
                    return
                }
                publicErr := "400 Bad Request"
                fmt.Fprintf(c.rwc, "HTTP/1.1 "+publicErr+errorHeaders+publicErr)
                return
            }
        }
    
        serverHandler{c.server}.ServeHTTP(w, w.req)
        w.cancelCtx()
        if c.hijacked() {
            return
        }
        w.finishRequest()
        if !w.shouldReuseConnection() {
            if w.requestBodyLimitHit || w.closedRequestBodyEarly() {
                c.closeWriteAndWait()
            }
            return
        }
        c.setState(c.rwc, StateIdle, runHooks)
        c.curReq.Store((*response)(nil))

        if !w.conn.server.doKeepAlives() {
            // We're in shutdown mode. We might've replied
            // to the user without "Connection: close" and
            // they might think they can send another
            // request, but such is life with HTTP/1.1.
            return
        }

        if d := c.server.idleTimeout(); d != 0 {
            c.rwc.SetReadDeadline(time.Now().Add(d))
            if _, err := c.bufr.Peek(4); err != nil {
                return
            }
        }
        c.rwc.SetReadDeadline(time.Time{})
    }
}

我們需要關注

① for迴圈,表示嘗試複用該conn,用於處理迎面而來的請求

② w.shouldReuseConnection() = false, 表明讀取到ClientConnection:Close標頭,設定closeAfterReply=true,跳出dor迴圈,協程即將結束,結束之前執行defer函式,defer函式內close該連線

c.close()
......
// Close the connection.
func (c *conn) close() {
    c.finalFlush()
    c.rwc.Close()
}

③ 如果 w.shouldReuseConnection() = true,則將該連線狀態置為idle, 並繼續走for迴圈,處理後續請求。

我的收穫

  1. tcp 4次揮手的八股文
  2. 短連線在伺服器上的效應,time_wait,佔用可用的SOCKET, 根據實際業務看是否需要切換為長連線
  3. golang http keep-alive複用tcp連線的原始碼級分析
  4. tcpdump抓包的姿勢

相關文章