golang timeoutHandler解析及kubernetes中的變種

gaorong404發表於2019-08-13

原文網址 : https://www.cnblogs.com/gaorong/p/11336834.html

Golang裡的http request timeout比較簡單，但是稍不留心就容易出現錯誤，最近在kubernetes生產環境中出現了的一個問題讓我有機會好好捋一捋golang中關於timeout中的所有相關的東西。

Basic

golang中timeout有關的設定，資料已經比較多，其中必須閱讀的就是The complete guide to Go net/http timeouts，裡面詳述了關於http中各個timeou欄位及其影響，寫的很詳細，本文就不在重複造輪子了。所以我們在生產環境中的程式碼絕對不能傻傻的使用http.Get("www.baidu.com")了，很容易造成client hang死，預設的http client的timeout值為0, 也就是沒有超時。具體的血淚教訓可以參見Don’t use Go’s default HTTP client (in production)。對於http package中default的設定最後還是仔細review一遍再使用。

Advanced

golang http.TimeoutHandler

瞭解了基本的使用方式後，筆者帶領大家解析一下其中的http.TimeoutHandler，TimeoutHandler顧名思義是一個handler wrapper，用來限制ServeHttp的最大時間，也就是除去讀寫請求外真正執行伺服器邏輯的時間（如果仔細分析的話其實各個結構體中關於timeout的設定中並沒有辦法來設定這部分超時時間)，如果執行時間超過了設定的時間，將返回一個"503 Service Unavailable" 和一個指定的message。
我們來一起探究一下他的實現，首先是函式定義：

// TimeoutHandler returns a Handler that runs h with the given time limit.
//
// The new Handler calls h.ServeHTTP to handle each request, but if a
// call runs for longer than its time limit, the handler responds with
// a 503 Service Unavailable error and the given message in its body.
// (If msg is empty, a suitable default message will be sent.)
// After such a timeout, writes by h to its ResponseWriter will return
// ErrHandlerTimeout.
//
// TimeoutHandler buffers all Handler writes to memory and does not
// support the Hijacker or Flusher interfaces.
func TimeoutHandler(h Handler, dt time.Duration, msg string) Handler {
    return &timeoutHandler{
        handler: h,
        body:    msg,
        dt:      dt,
    }
}

可以看到典型的handler wrapper的函式signature，接收一個handler並返回一個hander，返回的timeout handler中ServeHttp方法如下：

func (h *timeoutHandler) ServeHTTP(w ResponseWriter, r *Request) {
    ctx := h.testContext
    if ctx == nil {
        var cancelCtx context.CancelFunc
        ctx, cancelCtx = context.WithTimeout(r.Context(), h.dt)
        defer cancelCtx()
    }
    r = r.WithContext(ctx)
    done := make(chan struct{})
    tw := &timeoutWriter{
        w: w,
        h: make(Header),
    }
    panicChan := make(chan interface{}, 1)
    go func() {
        defer func() {
            if p := recover(); p != nil {
                panicChan <- p
            }
        }()
        h.handler.ServeHTTP(tw, r)
        close(done)
    }()
    select {
    case p := <-panicChan:
        panic(p)
    case <-done:
        tw.mu.Lock()
        defer tw.mu.Unlock()
        dst := w.Header()
        for k, vv := range tw.h {
            dst[k] = vv
        }
        if !tw.wroteHeader {
            tw.code = StatusOK
        }
        w.WriteHeader(tw.code)
        w.Write(tw.wbuf.Bytes())
    case <-ctx.Done():
        tw.mu.Lock()
        defer tw.mu.Unlock()
        w.WriteHeader(StatusServiceUnavailable)
        io.WriteString(w, h.errorBody())
        tw.timedOut = true
    }
}

整體流程為：

首先初始化context的timeout
初始化一個timeoutWriter，該timeoutWriter實現了http.ResponseWriter介面，內部結構體中有一個bytes.Buffer, 所有的Write方法都是寫入到該buffer中。
非同步goroutine呼叫serveHttp方法， timeoutWriter作為serveHttp的引數，所以此時寫入的資料並沒有傳送給使用者，而是快取到了timeoutWriter的buffer中
最後select監聽各個channel：
1. 如果子groutine panic，則捕獲該panic並在主grouinte中panic進行propagate
2. 如果請求正常完成則開始寫入header並將buffer中的內容寫給真正的http writer
3. 如果請求超時則返回使用者503

為什麼需要先寫入buffer，然後在寫給真正的writer吶？因為我們無法嚴格意義上的cancel掉一個請求。如果我們已經往一個http writer中寫了部分資料(例如已經寫了hedaer)，而此時因為某些邏輯處理較慢，並且發現已經過了timeout閾值，想要cancel該請求。此時已經沒有辦法真正意義上取消了，可能對端已經讀取了部分資料了。一個典型的場景是HTTP/1.1中的分塊傳輸，我們先寫入header，然後依次寫入各個chunk，如果後面的chunk還沒寫已經超時了，那此時就陷入了兩難的情況。
此時就需要使用golang內建的TimeoutHandler了，它提供了兩個優勢：

首先是提供了一個buffer，等到所有的資料寫入完成，如果此時沒有超時再統一傳送給對端。並且timeoutWriter在每次Write的時候都會判斷此時是否超時，如果超時就馬上返回錯誤。
給使用者返回一個友好的503提示

實現上述兩點的代價就是需要維護一個buffer來快取所有的資料。有些情況下是這個buffer會導致一定的問題，設想一下對於一個高吞吐的server，每個請求都維護一個buffer勢必是不可接受的，以kubernete為例，每次list pods時可能有好幾M的資料，如果每個請求都寫快取勢必會佔用過多記憶體，那kubernetes是如何實現timeout的吶？

kubernetes timeout Handler

kubernetes 為了防止某個請求hang死之後一直佔用連線，所以會對每個請求進行timeout的處理，這部分邏輯是在一個handler chain中WithTimeoutForNonLongRunningRequests handler實現。其中返回的WithTimeout的實現如下：

// WithTimeout returns an http.Handler that runs h with a timeout
// determined by timeoutFunc. The new http.Handler calls h.ServeHTTP to handle
// each request, but if a call runs for longer than its time limit, the
// handler responds with a 504 Gateway Timeout error and the message
// provided. (If msg is empty, a suitable default message will be sent.) After
// the handler times out, writes by h to its http.ResponseWriter will return
// http.ErrHandlerTimeout. If timeoutFunc returns a nil timeout channel, no
// timeout will be enforced. recordFn is a function that will be invoked whenever
// a timeout happens.
func WithTimeout(h http.Handler, timeoutFunc func(*http.Request) (timeout <-chan time.Time, recordFn func(), err *apierrors.StatusError)) http.Handler {
    return &timeoutHandler{h, timeoutFunc}
}

其中主要是timeoutHandler，實現如下：

type timeoutHandler struct {
    handler http.Handler
    timeout func(*http.Request) (<-chan time.Time, func(), *apierrors.StatusError)
}

func (t *timeoutHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    after, recordFn, err := t.timeout(r)
    if after == nil {
        t.handler.ServeHTTP(w, r)
        return
    }

    result := make(chan interface{})
    tw := newTimeoutWriter(w)
    go func() {
        defer func() {
            result <- recover()
        }()
        t.handler.ServeHTTP(tw, r)
    }()
    select {
    case err := <-result:
        if err != nil {
            panic(err)
        }
        return
    case <-after:
        recordFn()
        tw.timeout(err)
    }
}

如上，在ServeHTTP中主要做了幾件事情：

呼叫timeoutHandler.timeout設定一個timer，如果timeout時間到到達會通過after這個channel傳遞過來，後面會監聽該channel
建立timeoutWriter物件，該timeoutWriter中有一個timeout方法，該方法會在超時之後會被呼叫
非同步呼叫ServeHTTP並將timeoutWriter傳遞進去，如果該groutine panic則進行捕獲並通過channel傳遞到呼叫方groutine，因為我們不能因為一個groutine panic導致整個程式退出，而且呼叫方groutine對這些panic資訊比較感興趣，需要傳遞過去。
監聽定時器channel

如果定時器channel超時會呼叫timeoutWrite.timeout方法，該方法如下：

func (tw *baseTimeoutWriter) timeout(err *apierrors.StatusError) {
    tw.mu.Lock()
    defer tw.mu.Unlock()

    tw.timedOut = true

    // The timeout writer has not been used by the inner handler.
    // We can safely timeout the HTTP request by sending by a timeout
    // handler
    if !tw.wroteHeader && !tw.hijacked {
        tw.w.WriteHeader(http.StatusGatewayTimeout)
        enc := json.NewEncoder(tw.w)
        enc.Encode(&err.ErrStatus)
    } else {
        // The timeout writer has been used by the inner handler. There is
        // no way to timeout the HTTP request at the point. We have to shutdown
        // the connection for HTTP1 or reset stream for HTTP2.
        //
        // Note from: Brad Fitzpatrick
        // if the ServeHTTP goroutine panics, that will do the best possible thing for both
        // HTTP/1 and HTTP/2. In HTTP/1, assuming you're replying with at least HTTP/1.1 and
        // you've already flushed the headers so it's using HTTP chunking, it'll kill the TCP
        // connection immediately without a proper 0-byte EOF chunk, so the peer will recognize
        // the response as bogus. In HTTP/2 the server will just RST_STREAM the stream, leaving
        // the TCP connection open, but resetting the stream to the peer so it'll have an error,
        // like the HTTP/1 case.
        panic(errConnKilled)
    }
}

可以看到，如果此時還沒有寫入任何資料，則直接返回504狀態碼，否則直接panic。上面有一大段註釋說明為什麼panic，這段註釋的出處在kubernetes issue:
API server panics when writing response #29001。引用的是golang http包作者 Brad Fitzpatrick的話，意思是：如果我們已經往一個writer中寫入了部分資料，我們是沒有辦法timeout，此時goroutine panic或許是最好的選擇，無論是對於HTTP/1.1還是HTTP/2.0, 如果是HTTP/1.1, 他不會傳送任何資料，直接斷開tcp連線，此時對端就能夠識別出來server異常，如果是HTTP/2.0 此時srever會RST_STREAM該stream, 並且不會影響connnection, 對端也能夠很好的處理。這部分程式碼還是很有意思的，很難想象kubernetes會以panic掉groutine的方式來處理一個request的超時。

panic掉一個groutine，如果你上層沒有任何recover機制的話，整個程式都會退出，對於kubenernetes apiserver肯定是不能接受的， kubernetes在每個request的handler chain中會有一個genericfilters.WithPanicRecovery進行捕獲這樣的panic，避免整個程式崩潰。

Other

談完TimeoutHandler，再回到golang timeout，有時雖然我們正常timeout返回，但並不意味整個groutine就正常返回了。此時呼叫返回也只是上層返回了，非同步呼叫的底層邏輯沒有辦法撤回的。因為我們沒辦法cancel掉另一個grouine，只能是groutine主動退出，主動退出的實現思路大部分是通過傳遞一個context或者close channel給該groutine，該groutine監聽到退出訊號就終止，但是目前很多呼叫是不支援接收一個context或close channle作為引數的。
例如下面這段程式碼：因為在主邏輯中sleep了4s是沒有辦法中斷的，即時此時request已經返回，但是server端該groutine還是沒有被釋放，所以golang timeout這塊還是非常容易leak grouine的，使用的時候需要小心。

package main

import (
    "fmt"
    "net/http"
    "runtime"
    "time"
)

func main() {
    go func() {
        for {
            time.Sleep(time.Second)
            fmt.Printf("groutine num: %d\n", runtime.NumGoroutine())
        }
    }()

    handleFunc := func(w http.ResponseWriter, r *http.Request) {
        fmt.Printf("request %v\n", r.URL)
        time.Sleep(4 * time.Second)
        _, err := fmt.Fprintln(w, "ok")
        if err != nil {
            fmt.Printf("write err: %v\n", err)
        }
    }
    err := http.ListenAndServe("localhost:9999", http.TimeoutHandler(http.HandlerFunc(handleFunc), 2*time.Second, "err: timeout"))
    if err != nil {
        fmt.Printf("%v", err)
    }
}

寫在最後

golang timeout 簡單但是比較繁瑣，只有明白其原理才能真正防患於未然

kubernetes ingress 在物理機上的nodePort和hostNetwork兩種部署方式解析及比較
2019-06-19
深入解析kubernetes中的選舉機制
2022-06-28
golang — mgo解析各種資料型別分析
2019-02-16
Golang資料型別
Golang切片的三種簡單使用方式及區別
2018-09-12
Golang
golang 中的四種型別轉換總結
2020-04-03
Golang型別
golang bufio解析
2022-04-21
Golang
細說 Golang 的 JSON 解析
2019-12-09
GolangJSON
秒懂 Golang 中的條件變數（sync.Cond）
2022-12-14
Golang變數
postman中各種變數
2024-05-28
Postman變數
sed中變數引用的幾種方式
2024-11-24
變數
Golang 流式解析 Json
2019-02-16
GolangJSON
Golang : cobra 包解析
2019-05-16
Golang
JavaScript中的預解析(變數提升)介紹！
2021-03-24
JavaScript變數
JavaScript中的預解析(變數提升)介紹
2021-03-31
JavaScript變數
Java中Singleton的三種實現方式解析
2021-02-20
Java
Golang map執行緒安全實現及sync.map使用及原理解析。
2020-12-25
Golang執行緒
python中變數的命名及詳解
2021-09-11
Python變數
kubernetes client-go解析
2019-07-01
clientGo
深入解析Kubernetes admission webhooks
2022-07-11
WebHook
解析redis備份檔案rdb的兩種方法及對比
2019-07-12
Redis
golang中http server.go中的testHookServerServe函式變數寫法問題
2018-08-29
GolangHTTPServerHook函式變數
傲視Kubernetes(三)：Kubernetes中的Pod
2020-12-13
Linux 中改變主機名的 4 種方法
2019-03-25
Linux
shell指令碼中的變數及應用
2020-12-08
指令碼變數
Golang語言排序的幾種方式
2020-07-24
Golang排序
java日期時間各種變換及處理
2018-05-10
Java
golang中的errgroup
2022-10-08
Golang
golang中的介面
2022-10-19
Golang
golang 中的cronjob
2023-04-02
Golang
golang 中的 cronjob
2023-04-03
Golang
golang 定於變數
2018-05-11
Golang變數
STM32 記憶體分配解析及變數的儲存位置
2020-04-29
記憶體變數
Spark in action on Kubernetes - Spark Operator的原理解析
2019-04-03
Spark
Spring中AOP相關的API及原始碼解析
2020-07-02
SpringAPI原始碼
在Kubernetes中應用零信任的兩種快速配置方法 | inext
2020-08-06
Golang 併發程式設計中條件變數的理解與使用
2019-12-15
Golang程式設計變數
深入解析kubernetes controller-runtime
2022-06-27
Controller
Kubernetes如何實現DNS解析
2021-03-04
DNS