Go runtime 排程器精講（五）：排程策略

胡云Troy發表於2024-09-14

原文網址 : https://www.cnblogs.com/xingzheanan/p/18413743

原創文章，歡迎轉載，轉載請註明出處，謝謝。

0. 前言

在第四講我們介紹了 main goroutine 是如何執行的。其中針對 main goroutine 介紹了排程函式 schedule 是怎麼工作的，對於整個排程器的排程策略並沒有介紹，這點是不完整的，這一講會完善排程器的排程策略部分。

1. 排程時間點

runtime.schedule 實現了排程器的排程策略。那麼對於排程時間點，檢視哪些函式呼叫的 runtime.schedule 即可順藤摸瓜理出排程器的排程時間點，如下圖：

排程時間點不是本講的重點，這裡有興趣的同學可以順藤摸瓜，摸摸觸發排程時間點的路徑，這裡就跳過了。

2. 排程策略

排程策略才是我們的重點，進到 runtime.schedule：

// One round of scheduler: find a runnable goroutine and execute it.
// Never returns.
func schedule() {
    mp := getg().m                  // 獲取當前執行執行緒

top:
	pp := mp.p.ptr()                // 獲取執行執行緒繫結的 P
	pp.preempt = false

    // Safety check: if we are spinning, the run queue should be empty.
	// Check this before calling checkTimers, as that might call
	// goready to put a ready goroutine on the local run queue.
    if mp.spinning && (pp.runnext != 0 || pp.runqhead != pp.runqtail) {
		throw("schedule: spinning with local work")
	}

    gp, inheritTime, tryWakeP := findRunnable() // blocks until work is available

    ...
    execute(gp, inheritTime)        // 執行找到的 goroutine
}

runtime.schedule 的重點在 findRunnable()。findRunnable() 函式很長，為避免影響可讀性，這裡對大部分流程做了註釋，後面在有重點的加以介紹。進入 findRunnable()：

// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from local or global queue, poll network.
// tryWakeP indicates that the returned goroutine is not normal (GC worker, trace
// reader) so the caller should try to wake a P.
func findRunnable() (gp *g, inheritTime, tryWakeP bool) {
	mp := getg().m                                      // 獲取當前執行執行緒

top:
	pp := mp.p.ptr()                                    // 獲取執行緒繫結的 P
	...
	
    // Check the global runnable queue once in a while to ensure fairness.
	// Otherwise two goroutines can completely occupy the local runqueue
	// by constantly respawning each other.
	if pp.schedtick%61 == 0 && sched.runqsize > 0 {
		lock(&sched.lock)
		gp := globrunqget(pp, 1)
		unlock(&sched.lock)
		if gp != nil {
			return gp, false, false
		}
	}

    // local runq
	if gp, inheritTime := runqget(pp); gp != nil {      // 從 P 的本地佇列中獲取 goroutine
		return gp, inheritTime, false
	}

    // global runq
	if sched.runqsize != 0 {                            // 如果本地佇列獲取不到就判斷全域性佇列中有無 goroutine
		lock(&sched.lock)                               // 如果有的話，為全域性變數加鎖
		gp := globrunqget(pp, 0)                        // 從全域性佇列中拿 goroutine
		unlock(&sched.lock)                             // 為全域性變數解鎖
		if gp != nil {
			return gp, false, false
		}
	}

    // 如果全域性佇列中沒有 goroutine 則從 network poller 中取 goroutine
    if netpollinited() && netpollWaiters.Load() > 0 && sched.lastpoll.Load() != 0 {
		...
	}

    // 如果 network poller 中也沒有 goroutine，那麼嘗試從其它 P 中偷 goroutine
    // Spinning Ms: steal work from other Ps.
	//
	// Limit the number of spinning Ms to half the number of busy Ps.
	// This is necessary to prevent excessive CPU consumption when
	// GOMAXPROCS>>1 but the program parallelism is low.
    // 如果下面兩個條件至少有一個滿足，則進入偷 goroutine 邏輯
    // 條件 1： 當前執行緒是 spinning 自旋狀態
    // 條件 2： 當前活躍的 P 要遠大於自旋的執行緒，說明需要執行緒去分擔活躍執行緒的壓力，不要睡覺了
	if mp.spinning || 2*sched.nmspinning.Load() < gomaxprocs-sched.npidle.Load() {
        if !mp.spinning {                                       // 因為是兩個條件至少滿足一個即可，這裡首先判斷當前執行緒是不是自旋狀態
			mp.becomeSpinning()                                 // 如果不是，更新執行緒的狀態為自旋狀態
		}

        gp, inheritTime, tnow, w, newWork := stealWork(now)     // 偷 goroutine
		if gp != nil {
			// Successfully stole.
			return gp, inheritTime, false                       // 如果 gp 不等於 nil，表示偷到了，返回偷到的 goroutine
		}
		if newWork {                
			// There may be new timer or GC work; restart to
			// discover.
			goto top                                            // 如果 gp 不等於 nil，且 network 為 true，則跳到 top 標籤重新找 goroutine
		}

		now = tnow
		if w != 0 && (pollUntil == 0 || w < pollUntil) {
			// Earlier timer to wait for.
			pollUntil = w
		}
	}

    ...
    if sched.runqsize != 0 {                                    // 偷都沒偷到，還要在找一遍全域性佇列，防止偷的過程中，全域性佇列又有 goroutine 了
		gp := globrunqget(pp, 0)
		unlock(&sched.lock)
		return gp, false, false
	}

    if !mp.spinning && sched.needspinning.Load() == 1 {         // 在判斷一遍，如果 mp 不是自旋狀態，且 sched.needspinning == 1 則更新 mp 為自旋，呼叫 top 重新找一遍 goroutine
		// See "Delicate dance" comment below.
		mp.becomeSpinning()
		unlock(&sched.lock)
		goto top
	}

    // 實在找不到 goroutine，表明當前執行緒多， goroutine 少，準備掛起執行緒
    // 首先，呼叫 releasep 取消執行緒和 P 的繫結
    if releasep() != pp {                                       
		throw("findrunnable: wrong p")
	}

    ...
    now = pidleput(pp, now)                                     // 將解綁的 P 放到全域性空閒佇列中
    unlock(&sched.lock)

    wasSpinning := mp.spinning                                  // 到這裡 mp.spinning == true，執行緒處於自旋狀態
	if mp.spinning {
		mp.spinning = false                                     // 設定 mp.spinning = false，這是要準備休眠了
		if sched.nmspinning.Add(-1) < 0 {                       // 將全域性變數的自旋執行緒數減 1，因為當前執行緒準備休眠，不偷 goroutine 了
			throw("findrunnable: negative nmspinning")
		}
        ...
    }
    stopm()                                                     // 執行緒休眠，直到喚醒
	goto top                                                    // 能執行到這裡，說明執行緒已經被喚醒了，繼續找一遍 goroutine
}

看完執行緒的排程策略我都要被感動到了，何其的敬業，窮盡一切方式去找活幹，找不到活，休眠之前還要在找一遍，真的是勞模啊。

大致流程是比較清楚的，我們把其中一些值得深挖的部分在單拎出來。

首先，從本地佇列中找 goroutine，如果找不到則進入全域性佇列找，這裡如果看 gp := globrunqget(pp, 0) 可能會覺得疑惑，從全域性佇列中拿 goroutine 為什麼要把 P 傳進去，我們看這個函式在做什麼：

// Try get a batch of G's from the global runnable queue.
// sched.lock must be held.											// 註釋說的挺清晰了，把全域性佇列的 goroutine 放到 P 的本地佇列
func globrunqget(pp *p, max int32) *g {
	assertLockHeld(&sched.lock)										

	if sched.runqsize == 0 {
		return nil
	}

	n := sched.runqsize/gomaxprocs + 1								// 全域性佇列是執行緒共享的，這裡要除 gomaxprocs 平攤到每個執行緒繫結的 P
	if n > sched.runqsize {
		n = sched.runqsize											// 執行到這裡，說明 gomaxprocs == 1
	}
	if max > 0 && n > max {
		n = max
	}
	if n > int32(len(pp.runq))/2 {									
		n = int32(len(pp.runq)) / 2									// 如果 n 比本地佇列長度的一半要長，則 n == len(P.runq)/2
	}

	sched.runqsize -= n												// 全域性佇列長度減 n，準備從全域性佇列中拿 n 個 goroutine 到 P 中

	gp := sched.runq.pop()											// 把全域性佇列隊頭的 goroutine 拿出來，這個 goroutine 是要返回的 goroutine
	n--																// 拿出了一個隊頭的 goroutine，這裡 n 要減 1
	for ; n > 0; n-- {				
		gp1 := sched.runq.pop()										// 迴圈拿全域性佇列中的 goroutine 出來
		runqput(pp, gp1, false)										// 將拿出的 goroutine 放到全域性佇列中
	}
	return gp
}

呼叫 globrunqget 說明本地佇列沒有 goroutine 要從全域性佇列拿，那麼就可以把全域性佇列中的 goroutine 放到 P 中，提高了全域性佇列 goroutine 的優先順序。

如果全域性佇列也沒找到 goroutine，在從 network poller 找，如果 network poller 也沒找到，則準備進入自旋，從別的執行緒的 P 那裡偷活幹。我們看執行緒是怎麼偷活的：

// stealWork attempts to steal a runnable goroutine or timer from any P.
//
// If newWork is true, new work may have been readied.
//
// If now is not 0 it is the current time. stealWork returns the passed time or
// the current time if now was passed as 0.
func stealWork(now int64) (gp *g, inheritTime bool, rnow, pollUntil int64, newWork bool) {
	pp := getg().m.p.ptr()																// pp 是當前執行緒繫結的 P

	ranTimer := false

	const stealTries = 4																// 執行緒偷四次，每次都要隨機迴圈一遍所有 P
	for i := 0; i < stealTries; i++ {
		stealTimersOrRunNextG := i == stealTries-1

		for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {			// 為保證偷的隨機性，隨機開始偷 P。隨機開始，後面每個 P 都可以輪到
			...
			p2 := allp[enum.position()]													// 從 allp 中獲取 P
			if pp == p2 {
				continue																// 如果獲取的是當前執行緒繫結的 P，則繼續迴圈下一個 P
			}
			...
			// Don't bother to attempt to steal if p2 is idle.
			if !idlepMask.read(enum.position()) {										// 判斷拿到的 P 是不是 idle 狀態，如果是，表明 P 還沒有 goroutine，跳過它，偷下一家
				if gp := runqsteal(pp, p2, stealTimersOrRunNextG); gp != nil {			// P 不是 idle，呼叫 runqsteal 偷它！
					return gp, false, now, pollUntil, ranTimer
				}
			}
		}
	}

	// No goroutines found to steal. Regardless, running a timer may have
	// made some goroutine ready that we missed. Indicate the next timer to
	// wait for.
	return nil, false, now, pollUntil, ranTimer
}

執行緒隨機的偷一個可偷的 P，偷 P 的實現在 runqsteal，檢視 runqsteal 怎麼偷的：

// Steal half of elements from local runnable queue of p2
// and put onto local runnable queue of p.
// Returns one of the stolen elements (or nil if failed).						// 給寶寶餓壞了，直接偷一半的 goroutine 啊，夠狠的！
func runqsteal(pp, p2 *p, stealRunNextG bool) *g {
	t := pp.runqtail															// t 指向當前 P 本地佇列的隊尾
	n := runqgrab(p2, &pp.runq, t, stealRunNextG)								// runqgrab 把 P2 本地佇列的一半 goroutine 拿到 P 的 runq 佇列中
	if n == 0 {
		return nil
	}
	n--
	gp := pp.runq[(t+n)%uint32(len(pp.runq))].ptr()								// 把偷到的本地佇列隊尾的 goroutine 拿出來
	if n == 0 {
		return gp																// 如果只偷到了這一個，則直接返回。有總比沒有好
	}
	h := atomic.LoadAcq(&pp.runqhead) // load-acquire, synchronize with consumers
	if t-h+n >= uint32(len(pp.runq)) {
		throw("runqsteal: runq overflow")										// 如果 t-h+n >= len(p.runq) 表示偷多了...
	}
	atomic.StoreRel(&pp.runqtail, t+n) 											// 更新 P 的本地佇列的隊尾
	return gp
}

這個偷就是把“地主家”（P2）的餘糧 (goroutine) 給它搶一半過來，沒辦法我也要吃飯啊。

如果連偷都沒偷到（好吧，太慘了點...），那就準備休眠了，不幹活了還不行嘛。不幹活之前在去看看全域性佇列有沒有 goroutine 了（口是心非的 M 人）。還是沒活，好吧，準備休眠了。

準備休眠，首先解除和 P 的繫結：

func releasep() *p {
	gp := getg()

	if gp.m.p == 0 {
		throw("releasep: invalid arg")
	}
	pp := gp.m.p.ptr()
	if pp.m.ptr() != gp.m || pp.status != _Prunning {
		print("releasep: m=", gp.m, " m->p=", gp.m.p.ptr(), " p->m=", hex(pp.m), " p->status=", pp.status, "\n")
		throw("releasep: invalid p state")
	}
	...
	gp.m.p = 0
	pp.m = 0
	pp.status = _Pidle
	return pp
}

就是指標的解綁操作，程式碼很清晰，連註釋都不用，我們也不講了。

解綁之後，pidleput 把空閒的 P 放到全域性空閒佇列中。

接著，更新執行緒的狀態，從自旋更新為非自旋，呼叫 stopm 準備休眠：

// Stops execution of the current m until new work is available.
// Returns with acquired P.
func stopm() {
	gp := getg()							// 當前執行緒執行的 goroutine

	...

	lock(&sched.lock)
	mput(gp.m)								// 將執行緒放到全域性空閒執行緒佇列中
	unlock(&sched.lock)
	mPark()
	acquirep(gp.m.nextp.ptr())
	gp.m.nextp = 0
}

stopm 將執行緒放到全域性空閒執行緒佇列，接著呼叫 mPark 休眠執行緒：

// mPark causes a thread to park itself, returning once woken.
//
//go:nosplit
func mPark() {
	gp := getg()
	notesleep(&gp.m.park)					// notesleep 執行緒休眠
	noteclear(&gp.m.park)
}

func notesleep(n *note) {
	gp := getg()
	if gp != gp.m.g0 {
		throw("notesleep not on g0")
	}
	ns := int64(-1)
	if *cgo_yield != nil {
		// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
		ns = 10e6
	}
	for atomic.Load(key32(&n.key)) == 0 {					// 這裡透過 n.key 判斷執行緒是否喚醒，如果等於 0，表示未喚醒，執行緒繼續休眠
		gp.m.blocked = true
		futexsleep(key32(&n.key), 0, ns)					// 呼叫 futex 休眠執行緒，執行緒會“阻塞”在這裡，直到被喚醒
		if *cgo_yield != nil {
			asmcgocall(*cgo_yield, nil)
		}
		gp.m.blocked = false								// “喚醒”，設定執行緒的 blocked 標記為 false
	}
}

// One-time notifications.
func noteclear(n *note) {									
	n.key = 0												// 執行到 noteclear 說明，執行緒已經被喚醒了，這時候執行緒重置 n.key 標誌位為 0
}

執行緒休眠是透過呼叫 futex 進入作業系統核心完成執行緒休眠的，關於 futex 的內容可以參考這裡。

執行緒的 n.key 是休眠的標誌位，當 n.key 不等於 0 時表示有執行緒在喚醒休眠執行緒，執行緒從休眠狀態恢復到正常狀態。喚醒休眠執行緒透過呼叫 notewakeup(&nmp.park) 函式實現：

func notewakeup(n *note) {
	old := atomic.Xchg(key32(&n.key), 1)
	if old != 0 {
		print("notewakeup - double wakeup (", old, ")\n")
		throw("notewakeup - double wakeup")
	}
	futexwakeup(key32(&n.key), 1)					// 呼叫 futexwakeup 喚醒休眠執行緒
}

首先，執行緒是怎麼找到休眠執行緒的？執行緒透過全域性空閒執行緒佇列找到空閒的執行緒，並且將空閒執行緒的休眠標誌位 m.park 傳給 notewakeup，最後呼叫 futexwakeup 喚醒休眠執行緒。

值得一提的是，喚醒的執行緒在喚醒之後還是會繼續找可執行的 goroutine 直到找到：

func stopm() {
	...
	mPark()								// 如果 mPark 返回，表示執行緒被喚醒，開始正常工作
	acquirep(gp.m.nextp.ptr())			// 前面休眠前，執行緒已經和 P 解綁了。這裡在給執行緒找一個 P 繫結
	gp.m.nextp = 0						// 執行緒已經繫結到 P 了，重置 nextp
}

基本這就是排程策略中很重要的一部分，執行緒如何找 goroutine。找到 goroutine 之後呼叫 gogo 執行該 goroutine。

3. 小結

本講繼續豐富了排程器的排程策略，下一講，我們開始非 main goroutine 的介紹。

Go runtime 排程器精講（二）：排程器初始化
2024-09-11
Go
Go runtime 排程器精講（七）：案例分析
2024-09-15
Go
Go runtime 排程器精講（三）：main goroutine 建立
2024-09-13
GoAI
Go runtime 排程器精講（一）：Go 程式初始化
2024-09-11
Go
Go runtime 排程器精講（十）：非同步搶佔
2024-09-16
Go非同步
Go runtime 排程器精講（四）：執行 main goroutine
2024-09-13
GoAI
Go runtime 排程器精講（十一）：總覽全域性
2024-09-17
Go
Go runtime 排程器精講（六）：非 main goroutine 執行
2024-09-14
GoAI
Go Runtime 的排程器
2021-06-10
Go
Go runtime 排程器精講（九）：系統呼叫引起的搶佔
2024-09-16
Go
Flink排程之排程器、排程策略、排程模式
2023-03-08
模式
Go runtime 排程器精講（八）：執行時間過長的搶佔
2024-09-16
Go
排程器簡介，以及Linux的排程策略
2020-03-26
Linux
Go排程器系列（2）巨集觀看排程器
2019-03-27
Go
Go排程器系列（3）圖解排程原理
2019-04-06
Go圖解
Go語言排程器之主動排程(20)
2019-05-28
Go
Go 併發程式設計 - runtime 協程排程（三）
2023-11-01
Go程式設計
Go語言排程器之排程main goroutine（14）
2019-05-09
GoAI
Go 排程模型 GPM
2020-06-08
Go模型
第3講：程序排程
2024-06-10
Kubernetes 排程器
2020-11-21
Pod的排程是由排程器（kube-scheduler）
2024-10-11
Yarn的排程器
2023-10-02
Yarn
Go語言goroutine排程器初始化
2024-06-11
Go
k8s排程器介紹（排程框架版本）
2021-10-15
K8S框架
Go 的搶佔式排程
2021-05-13
Go
haipproxy核心校驗和排程策略
2018-03-04
AI
[典藏版] Golang 排程器 GMP 原理與排程全分析
2020-03-11
Golang
Go語言排程器之盜取goroutine(17)
2019-05-17
Go
也談goroutine排程器
2018-11-02
Go
Linux I/O排程器
2019-03-04
Linux
Go timer 是如何被排程的？
2021-09-24
Go
kubernetes 排程
2022-07-14
Spark中資源排程和任務排程
2021-11-12
Spark
Go排程器系列（4）原始碼閱讀與探索
2019-04-15
Go原始碼
Kubernetes 排程器實現初探
2019-03-07
改造 Kubernetes 自定義排程器
2024-05-26
k8s排程器
2019-11-17
K8S

Go runtime 排程器精講（五）：排程策略

0. 前言

1. 排程時間點

2. 排程策略

3. 小結

相關文章