Go runtime 排程器精講（八）：執行時間過長的搶佔

胡云Troy發表於2024-09-16

原文網址 : https://www.cnblogs.com/xingzheanan/p/18415899

原創文章，歡迎轉載，轉載請註明出處，謝謝。

0. 前言

在 Go runtime 排程器精講（七）：案例分析一文我們介紹了一個搶佔的案例。從案例分析搶佔的實現，並未涉及到原始碼層面。本文將繼續從原始碼入手，看 Go runtime 排程器是如何實現搶佔邏輯的。

1. sysmon 執行緒

還記得 Go runtime 排程器精講（四）：執行 main goroutine 一文我們蜻蜓點水的提了一嘴 sysmon 執行緒，它是執行在系統棧上的監控執行緒，負責監控 goroutine 的狀態，並且做相應處理。當然，也負責做搶佔的處理，它是本講的重點。

sysmon 的建立在 src/runtime/proc.go:sysmon：

// The main goroutine.
func main() {
	...
	if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
		systemstack(func() {
			newm(sysmon, nil, -1)
		})
	}
    ...
}

sysmon 不需要和 P 繫結，作為監控執行緒執行在系統棧。進入 sysmon：

func sysmon() {
	...
    idle := 0 // how many cycles in succession we had not wokeup somebody
	delay := uint32(0)

	for {
		if idle == 0 { // start with 20us sleep...
			delay = 20  // 
		} else if idle > 50 { // start doubling the sleep after 1ms...
			delay *= 2
		}
		if delay > 10*1000 { // up to 10ms
			delay = 10 * 1000
		}
		usleep(delay)           // 休眠 delay us

        // retake P's blocked in syscalls
		// and preempt long running G's
		if retake(now) != 0 {
			idle = 0
		} else {
			idle++
		}
        ...
    }
}

省略了很多和搶佔無關的內容，和搶佔相關的是 retake 函式，進入 retake：

func retake(now int64) uint32 {
	n := 0
	lock(&allpLock)
    
    // 。。。
	for i := 0; i < len(allp); i++ {
        if pp == nil {
			// This can happen if procresize has grown
			// allp but not yet created new Ps.
			continue
		}

        pd := &pp.sysmontick                                        // 用於 sysmon 執行緒記錄被監控 p 的系統呼叫次數和呼叫時間
		s := pp.status
		sysretake := false
		if s == _Prunning || s == _Psyscall {                       // 如果 P 是 _Prunning 或者 _Psyscall，則對 P 進行處理
			// Preempt G if it's running for too long.
			t := int64(pp.schedtick)                                // P 的 schedtick 用於記錄 P 被排程的次數
			if int64(pd.schedtick) != t {                           
				pd.schedtick = uint32(t)                            // 如果系統監控和排程次數不一致，則更新系統監控的排程次數和排程時間點
				pd.schedwhen = now
			} else if pd.schedwhen+forcePreemptNS <= now {          // forcePreemptNS 為 10ms，如果 P 的 goroutine 執行時間超過 10ms 則對 P 發起搶佔
				preemptone(pp)                                      // 搶佔 P
				// In case of syscall, preemptone() doesn't
				// work, because there is no M wired to P.
				sysretake = true                                    // 設定 retake 標誌為 true
			}
		}
        ...
    }
    unlock(&allpLock)
	return uint32(n)
}

這裡重點在如果 P 的 goroutine 執行時間過長，則進入 preemptone(pp) 搶佔 P，也就是搶佔執行時間過長的 goroutine。

1.1 搶佔執行時間過長的 goroutine

進入 preemptone：

func preemptone(pp *p) bool {
	mp := pp.m.ptr()                                                // P 繫結的執行緒 
	if mp == nil || mp == getg().m {
		return false
	}
	gp := mp.curg                                                   // 執行緒執行的 goroutine，就是該 goroutine 執行過長的
	if gp == nil || gp == mp.g0 {
		return false
	}

    gp.preempt = true                                               // 設定搶佔標誌位為 true

    // Every call in a goroutine checks for stack overflow by
	// comparing the current stack pointer to gp->stackguard0.
	// Setting gp->stackguard0 to StackPreempt folds
	// preemption into the normal stack overflow check.
	gp.stackguard0 = stackPreempt                                   // 官方的註釋已經很清晰了，設定 goroutine 的 stackguard0 為 stackPreempt，stackPreempt 是一個比任何棧都大的數

    // Request an async preemption of this P.
	if preemptMSupported && debug.asyncpreemptoff == 0 {            // 是否開啟非同步搶佔，這裡我們先忽略
		pp.preempt = true
		preemptM(mp)
	}

	return true
}

可以看到，preemptone 主要是更新了 goroutine 的 gp.stackguard0，為什麼更新這個呢？

主要是在下一次呼叫函式時，排程器會根據這個值判斷是否應該搶佔當前 goroutine。

我們看一個 goroutine 棧如下：

func gpm() {
	print("hello runtime")
}

func main() {
	go gpm()
	time.Sleep(1 * time.Minute)
	print("hello main")
}

給 goroutine 加斷點，dlv 進入斷點處：

(dlv) b main.gpm
Breakpoint 1 set at 0x46232a for main.gpm() ./main.go:5
(dlv) c
> main.gpm() ./main.go:5 (hits goroutine(5):1 total:1) (PC: 0x46232a)
     1: package main
     2:
     3: import "time"
     4:
=>   5: func gpm() {
     6:         print("hello runtime")
     7: }
     8:
     9: func main() {
    10:         go gpm()
(dlv) disass
TEXT main.gpm(SB) /root/go/src/foundation/gpm/main.go
        main.go:5       0x462320        493b6610        cmp rsp, qword ptr [r14+0x10]
        main.go:5       0x462324        762a            jbe 0x462350
        main.go:5       0x462326        55              push rbp
        main.go:5       0x462327        4889e5          mov rbp, rsp
=>      main.go:5       0x46232a*       4883ec10        sub rsp, 0x10
        main.go:6       0x46232e        e82d28fdff      call $runtime.printlock
        ...
        main.go:5       0x462350        e8abb1ffff      call $runtime.morestack_noctxt
        main.go:5       0x462355        ebc9            jmp $main.gpm

在 main.gpm 棧中，首先執行 cmp rsp, qword ptr [r14+0x10] 指令，這個指令的意思是將當前棧的棧頂和 [r14+0x10] 比較，[r14+0x10] 就是 goroutine 的 stackguard0 值。如果 rsp 大於 g.stackguard0 表示棧容量是足夠的，如果小於 g.stackguard0 表示棧空間不足，需要執行 jbe 0x462350 跳轉指令，呼叫 call $runtime.morestack_noctxt 擴棧。

這裡如果 goroutine 是要被搶佔的，那麼 g.stackguard0 將被 sysmon 設定成很大的值。goroutine（中的函式）在呼叫時，會執行 cmp rsp, qword ptr [r14+0x10] 指令比較棧頂指標和 g.stackguard0。因為棧頂 rsp 肯定小於 g.stackguard0，呼叫 call $runtime.morestack_noctxt 擴棧。

進入 runtime.morestack_noctxt：

// morestack but not preserving ctxt.
TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0
	MOVL	$0, DX
	JMP	runtime·morestack(SB)

TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
    ...
    // runtime.morestack 內容很多，這裡只挑重點和搶佔相關的 runtime.newstack 介紹
    BL	runtime·newstack(SB)
    ...

進入 runtime.newstack：

func newstack() {
    thisg := getg()
    ...
    gp := thisg.m.curg
    ...
    stackguard0 := atomic.Loaduintptr(&gp.stackguard0)
    preempt := stackguard0 == stackPreempt                                  // 如果 gp.stackguard0 == stackPreempt，則設定搶佔標誌 preempt == true
    if preempt {
		if !canPreemptM(thisg.m) {                                          // 判斷是否可以搶佔
			// Let the goroutine keep running for now.
			// gp->preempt is set, so it will be preempted next time.
			gp.stackguard0 = gp.stack.lo + stackGuard                       // 如果不能搶佔，恢復 gp.stackguard0 為正常值
			gogo(&gp.sched) // never return                                 // gogo 執行 goroutine
		}
	}
    ...
    if preempt {                                                            // 執行到這裡，說明 goroutine 是可以搶佔的，再次判斷搶佔標誌是否為 true
		if gp == thisg.m.g0 {
			throw("runtime: preempt g0")
		}
		if thisg.m.p == 0 && thisg.m.locks == 0 {
			throw("runtime: g is running but p is not")
		}

		...

		if gp.preemptStop {                                                 // 判斷搶佔型別是否是 preemptStop，這個型別和 GC 有關，這裡我們不討論
			preemptPark(gp) // never returns
		}

		// Act like goroutine called runtime.Gosched.
		gopreempt_m(gp) // never return                                     // 重點看 gopreempt_m 進行的搶佔
	}
    ...
}

newstack 會執行搶佔邏輯，如註釋所示，經過層層執行，呼叫 gopreempt_m 搶佔執行時間過長的 goroutine：

func gopreempt_m(gp *g) {
	goschedImpl(gp)
}

func goschedImpl(gp *g) {
	status := readgstatus(gp)                           // 獲取 goroutine 的狀態
	if status&^_Gscan != _Grunning {
		dumpgstatus(gp)
		throw("bad g status")
	}
	casgstatus(gp, _Grunning, _Grunnable)               // 這時候 goroutine 還是執行的，更新 goroutine 的狀態為 _Grunnable
	dropg()                                             // 呼叫 dropg 解除執行緒和 goroutine 的繫結
	lock(&sched.lock)
	globrunqput(gp)                                     // 將 goroutine 放到全域性可執行佇列中，因為 goroutine 執行時間夠長了，不會放到 P 的本地佇列中，這也是一種懲罰機制吧
	unlock(&sched.lock)

	schedule()                                          // 執行緒再次進入排程邏輯，執行下一個 _Grunnable 的 goroutine
}

至此，我們知道對於執行時間過長的 goroutine 是怎麼搶佔的。

再次梳理下執行流程：

sysmon 監控執行緒發現執行時間過長的 goroutine，將 goroutine 的 stackguard0 更新為一個比任何棧都大的 stackPreempt 值
當執行緒進行函式呼叫時，會比較棧頂 rsp 和 g.stackguard0 檢查 goroutine 棧的棧空間。
因為更新了 goroutine 棧的 stackguard0，執行緒會走到擴充套件邏輯，進入根據 preempt 標誌位，執行對應的搶佔排程。

2. 小結

本講介紹了 sysmon 執行緒，順著 sysmon 執行緒介紹了搶佔執行時間過長的 goroutine 的實現方式。下一講會繼續介紹 sysmon 執行緒和搶佔系統呼叫時間過長的 goroutine。

Go runtime 排程器精講（十）：非同步搶佔
2024-09-16
Go非同步
Go runtime 排程器精講（九）：系統呼叫引起的搶佔
2024-09-16
Go
Go runtime 排程器精講（四）：執行 main goroutine
2024-09-13
GoAI
Go runtime 排程器精講（六）：非 main goroutine 執行
2024-09-14
GoAI
Go runtime 排程器精講（五）：排程策略
2024-09-14
Go
Go runtime 排程器精講（二）：排程器初始化
2024-09-11
Go
Go runtime 排程器精講（七）：案例分析
2024-09-15
Go
Go runtime 排程器精講（三）：main goroutine 建立
2024-09-13
GoAI
Go runtime 排程器精講（一）：Go 程式初始化
2024-09-11
Go
Go runtime 排程器精講（十一）：總覽全域性
2024-09-17
Go
Go 的搶佔式排程
2021-05-13
Go
Go Runtime 的排程器
2021-06-10
Go
Go1.12將支援搶佔式goroutine排程
2018-04-21
Go
linux搶佔式排程
2019-05-19
Linux
async-await：協作排程 vs 搶佔排程
2022-01-26
AI
go1.14 基於訊號的搶佔式排程實現原理
2020-02-26
Go
Go 併發程式設計 - runtime 協程排程（三）
2023-11-01
Go程式設計
非可搶佔式和搶佔式程式排程的區別是什麼?
2018-04-14
程式碼精簡執行過程
2024-03-13
sleep 時間段不佔指令碼執行時間
2020-06-05
指令碼
Go排程器系列（2）巨集觀看排程器
2019-03-27
Go
Go排程器系列（3）圖解排程原理
2019-04-06
Go圖解
Go語言排程器之主動排程(20)
2019-05-28
Go
kube-scheduler原始碼分析（3）-搶佔排程分析
2022-03-13
原始碼
Java執行緒的排程
2018-05-03
Java執行緒
Java執行緒池的增長過程
2019-08-06
Java執行緒
Go語言排程器之排程main goroutine（14）
2019-05-09
GoAI
Runtime 執行時(未完待續)
2018-07-04
執行時的頁面構建過程
2018-12-21
程序中的執行緒排程
2024-09-10
執行緒
CloseableHttpClient 連線超時導致XxlJob排程阻塞，影響排程任務的執行
2024-05-22
HTTPclient
伺服器長時間執行如何做好維護
2022-10-19
伺服器
同個執行緒裡，如果執行緒正在忙過程中，定時器時間到了會被延遲觸發嗎？
2024-03-10
執行緒定時器
laravel框架任務排程（定時執行任務）
2019-05-11
Laravel框架
jsp的執行過程
2024-03-18
JS
指令的執行過程
2024-08-11
Linux Shell指令碼時間排程
2020-12-28
Linux指令碼
RxJava原始碼解析(二)—執行緒排程器Scheduler
2019-03-04
RxJava原始碼執行緒

Go runtime 排程器精講（八）：執行時間過長的搶佔

0. 前言

1. sysmon 執行緒

1.1 搶佔執行時間過長的 goroutine

2. 小結

相關文章