golang 原始碼分析之scheduler排程器

nevermoress發表於2020-11-17

Golang原始碼

單執行緒排程器 · 0.x

只包含 40 多行程式碼；

程式中只能存在一個活躍執行緒，由 G-M 模型組成；

多執行緒排程器 · 1.0

允許執行多執行緒的程式；

全域性鎖導致競爭嚴重；

任務竊取排程器 · 1.1

引入了處理器 P，構成了目前的 G-M-P 模型；

在處理器 P 的基礎上實現了基於工作竊取的排程器；

在某些情況下，Goroutine 不會讓出執行緒，進而造成飢餓問題；

時間過長的垃圾回收（Stop-the-world，STW）會導致程式長時間無法工作；

搶佔式排程器 · 1.2 ~ 至今

基於協作的搶佔式排程器 - 1.2 ~ 1.13

通過編譯器在函式呼叫時插入搶佔檢查指令，在函式呼叫時檢查當前 Goroutine 是否發起了搶佔請求，實現基於協作的搶佔式排程
；
Goroutine 可能會因為垃圾回收和迴圈長時間佔用資源導致程式暫停；

基於訊號的搶佔式排程器 - 1.14 ~ 至今

實現基於訊號的真搶佔式排程；

垃圾回收在掃描棧時會觸發搶佔排程；

搶佔的時間點不夠多，還不能覆蓋全部的邊緣情況；

非均勻儲存訪問排程器 · 提案

對執行時的各種資源進行分割槽；

實現非常複雜，到今天還沒有提上日程；

協作的搶佔式排程器

1 編譯器會在呼叫函式前插入 runtime.morestack；

2 Go 語言執行時會在垃圾回收暫停程式、系統監控發現 Goroutine 執行超過 10ms 時發出搶佔請求 StackPreempt；

3 當發生函式呼叫時，可能會執行編譯器插入的 runtime.morestack 函式，它呼叫的 runtime.newstack 會檢查 Goroutine 的 stackguard0 欄位是否為 StackPreempt；

4 如果 stackguard0 是 StackPreempt，就會觸發搶佔讓出當前執行緒；

非協作的搶佔式排程

1 程式啟動時，在 runtime.sighandler 函式中註冊 SIGURG 訊號的處理函式 runtime.doSigPreempt；

2 在觸發垃圾回收的棧掃描時會呼叫 runtime.suspendG 掛起 Goroutine，該函式會執行下面的邏輯：

1.將 _Grunning 狀態的 Goroutine 標記成可以被搶佔，即將 preemptStop 設定成 true；

2.呼叫 runtime.preemptM 觸發搶佔

3 runtime.preemptM 會呼叫 runtime.signalM 向執行緒傳送訊號 SIGURG；

4 作業系統會中斷正在執行的執行緒並執行預先註冊的訊號處理函式 runtime.doSigPreempt；

5 runtime.doSigPreempt 函式會處理搶佔訊號，獲取當前的 SP 和 PC 暫存器並呼叫 runtime.sigctxt.pushCall

6 runtime.sigctxt.pushCall 會修改暫存器並在程式回到使用者態時執行runtime.asyncPreempt；

7 彙編指令 runtime.asyncPreempt 會呼叫執行時函式 runtime.asyncPreempt2；

8 runtime.asyncPreempt2 會呼叫 runtime.preemptPark；

9 runtime.preemptPark 會修改當前 Goroutine 的狀態到 _Gpreempted 並呼叫 runtime.schedule 讓當前函式陷入休眠並讓出執行緒，排程器會選擇其它的 Goroutine 繼續執行；

// src/runtime/runtime2.go
type m struct {
	g0          *g			// 用於執行排程指令的 goroutine
	gsignal     *g			// 處理 signal 的 g
	tls         [6]uintptr	// 執行緒本地儲存
	curg        *g			// 當前執行的使用者 goroutine
	p           puintptr	// 執行 go 程式碼時持有的 p (如果沒有執行則為 nil)
	spinning    bool		// m 當前沒有執行 work 且正處於尋找 work 的活躍狀態
	cgoCallers  *cgoCallers	// cgo 呼叫崩潰的 cgo 回溯
	alllink     *m			// 在 allm 上
	mcache      *mcache

	...
}

type p struct {
	id           int32
	status       uint32 // p 的狀態 pidle/prunning/...
	link         puintptr
	m            muintptr   // 反向連結到關聯的 m （nil 則表示 idle）
	mcache       *mcache
	pcache       pageCache
	deferpool    [5][]*_defer // 不同大小的可用的 defer 結構池
	deferpoolbuf [5][32]*_defer
	runqhead     uint32	// 可執行的 goroutine 佇列，可無鎖訪問
	runqtail     uint32
	runq         [256]guintptr
	runnext      guintptr
	timersLock   mutex
	timers       []*timer
	preempt      bool
	...
}

type g struct {
	stack struct {
		lo uintptr
		hi uintptr
	} 							// 棧記憶體：[stack.lo, stack.hi)
	stackguard0	uintptr
	stackguard1 uintptr

	_panic       *_panic
	_defer       *_defer
	m            *m				// 當前的 m
	sched        gobuf
	stktopsp     uintptr		// 期望 sp 位於棧頂，用於回溯檢查
	param        unsafe.Pointer // wakeup 喚醒時候傳遞的引數
	atomicstatus uint32
	goid         int64
	preempt      bool       	// 搶佔訊號，stackguard0 = stackpreempt 的副本
	timer        *timer         // 為 time.Sleep 快取的計時器

	...
}

type schedt struct {
	lock mutex

	pidle      puintptr	// 空閒 p 連結串列
	npidle     uint32	// 空閒 p 數量
	nmspinning uint32	// 自旋狀態的 M 的數量
	runq       gQueue	// 全域性 runnable G 佇列
	runqsize   int32
	gFree struct {		// 有效 dead G 的全域性快取.
		lock    mutex
		stack   gList	// 包含棧的 Gs
		noStack gList	// 沒有棧的 Gs
		n       int32
	}
	sudoglock  mutex	// sudog 結構的集中快取
	sudogcache *sudog
	deferlock  mutex	// 不同大小的有效的 defer 結構的池
	deferpool  [5]*_defer
	
	...
}

runtime.newproc1 獲取 Goroutine 結構體的三種方法

1 當處理器的 Goroutine 列表為空時，會將排程器持有的空閒 Goroutine 轉移到當前處理器上，直到 gFree 列表中的 Goroutine 數量達到 32；

2 當處理器的 Goroutine 數量充足時，會從列表頭部返回一個新的 Goroutine；

3 當排程器的 gFree 和處理器的 gFree 列表都不存在結構體時，執行時會呼叫 runtime.malg 初始化一個新的 runtime.g 結構體，如果申請的堆疊大小大於 0，在這裡我們會通過 runtime.stackalloc 分配 1KB 的棧空間：

總結：runtime.newproc1 會從處理器或者排程器的快取中獲取新的結構體，也可以呼叫 runtime.malg 函式建立新的結構體。

執行佇列

runtime.runqput 函式會將新建立的 Goroutine 執行佇列上，這既可能是全域性的執行佇列，也可能是處理器本地的執行佇列

// runqput嘗試將g放置在本地可執行佇列中。
// 如果next為false，則runqput將g新增到可執行佇列的尾部。
// 如果next為true，則runqput將g放入_p_.runnext中。
// 如果執行佇列已滿，則runnext將g放入全域性佇列。
// Executed only by the owner P.
func runqput(_p_ *p, gp *g, next bool) {
	if randomizeScheduler && next && fastrand()%2 == 0 {
		next = false
	}

	if next {
	retryNext:
		oldnext := _p_.runnext
		if !_p_.runnext.cas(oldnext, guintptr(unsafe.Pointer(gp))) {
			goto retryNext
		}
		if oldnext == 0 {
			return
		}
		// Kick the old runnext out to the regular run queue.
		gp = oldnext.ptr()
	}

retry:
	h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with consumers
	t := _p_.runqtail
	if t-h < uint32(len(_p_.runq)) {
		_p_.runq[t%uint32(len(_p_.runq))].set(gp)
		atomic.StoreRel(&_p_.runqtail, t+1) // store-release, makes the item available for consumption
		return
	}
	if runqputslow(_p_, gp, h, t) {
		return
	}
	// the queue is not full, now the put above must succeed
	goto retry
}

//從本地執行佇列、全域性執行佇列中查詢
//從網路輪詢器中查詢是否有 Goroutine 等待執行
//通過 runtime.runqsteal 函式嘗試從其他隨機的處理器中竊取待執行的 Goroutine，在該過程中還可能竊取處理器中的計時器；
//總而言之，當前函式一定會返回一個可執行的 Goroutine，如果當前不存在就會阻塞等待。
func schedule() {
 ...
}

排程時間點

1 主動掛起 — runtime.gopark -> runtime.park_m
2 系統呼叫 — runtime.exitsyscall -> runtime.exitsyscall0
3 協作式排程 — runtime.Gosched -> runtime.gosched_m -> runtime.goschedImpl
4 系統監控 — runtime.sysmon -> runtime.retake -> runtime.preemptone

kube-scheduler原始碼分析（3）-搶佔排程分析
2022-03-13
原始碼
RxJava原始碼解析(二)—執行緒排程器Scheduler
2019-03-04
RxJava原始碼執行緒
Kubernetes原始碼分析之kube-scheduler
2019-03-13
原始碼
Pod的排程是由排程器（kube-scheduler）
2024-10-11
[典藏版] Golang 排程器 GMP 原理與排程全分析
2020-03-11
Golang
Linux 核心排程器原始碼分析 - 初始化
2021-05-14
Linux原始碼
Golang的GMP排程模型與原始碼解析
2024-11-17
Golang模型原始碼
比特幣原始碼分析:任務排程器的使用
2019-02-22
比特幣原始碼
從原始碼分析 GMP 排程原理
2024-12-08
原始碼
scheduler原始碼分析——preempt搶佔
2021-10-09
原始碼
Linux程式排程邏輯與原始碼分析
2019-02-13
Linux原始碼
go 原始碼分析 goroutine 概覽與排程
2020-11-01
Go原始碼
[原始碼分析] 定時任務排程框架 Quartz 之故障切換
2021-05-24
原始碼框架quartz
Golang原始碼學習：排程邏輯（四）系統呼叫
2020-05-27
Golang原始碼
Golang原始碼學習：排程邏輯（一）初始化
2020-05-24
Golang原始碼
Java排程執行緒池ScheduledThreadPoolExecutor原始碼分析
2019-03-01
Java執行緒thread原始碼
Golang原始碼學習：排程邏輯（二）main goroutine的建立
2020-05-25
Golang原始碼AI
Oracle無法自動排程DBMS_JOB&DBMS_SCHEDULER案例分析
2022-12-08
Oracle
libgo原始碼分析之多執行緒協程管理和排程
2020-12-05
Go原始碼執行緒
oracle排程程式作業dbms_scheduler
2018-08-20
Oracle
oracle使用DBMS_SCHEDULER排程作業
2018-08-20
Oracle
Golang WaitGroup原始碼分析
2018-04-12
GolangAI原始碼
Superior Scheduler：帶你瞭解FusionInsight MRS的超級排程器
2021-10-09
Golang原始碼學習：排程邏輯（三）工作執行緒的執行流程與排程迴圈
2020-05-27
Golang原始碼執行緒
Spring原始碼分析之`BeanFactoryPostProcessor`呼叫過程
2020-10-27
Spring原始碼Bean
Go語言排程器之排程main goroutine（14）
2019-05-09
GoAI
OpenMP For Construct dynamic 排程方式實現原理和原始碼分析
2023-02-03
Struct原始碼
OPENMP FOR CONSTRUCT GUIDED 排程方式實現原理和原始碼分析
2023-02-15
StructGUIIDE原始碼
Go排程器系列（4）原始碼閱讀與探索
2019-04-15
Go原始碼
詳解 MySQL 用事件排程器 Event Scheduler 建立定時任務
2019-05-27
MySql事件
詳解MySQL用事件排程器Event Scheduler建立定時任務
2021-09-09
MySql事件
從k8s叢集e2e排程慢告警看kube-scheduler原始碼
2022-04-07
K8S原始碼
kube-scheduler原始碼分析（2）-核心處理邏輯分析
2022-03-06
原始碼
LiteOS-任務篇-原始碼分析-任務排程函式
2020-10-13
原始碼函式
swoole 協程原始碼解讀 (協程的排程)
2019-09-10
原始碼
聊聊kube-scheduler如何完成排程和調整排程權重
2023-12-18
kube-scheduler原始碼分析（1）-初始化與啟動分析
2022-02-20
原始碼
Flink排程之排程器、排程策略、排程模式
2023-03-08
模式