GO: sync.Mutex 的實現與演進

dongzerun發表於2019-04-30

簡書連結 前幾天某個群裡問,sync.Mutex 是否有自旋邏輯,抽時間看了下原始碼。不得了,小小的 Mutex 居然進化了三個版本,從這也可以看到 go 社群一直在積極的優化與演進

  1. 最樸素的實現互斥鎖,拿到鎖返回,拿不到就將當前 goroutine 休眠
  2. 增加了自旋 spinlock 的邏輯,也就是說大部份 Mutex 鎖住時間如果很短,那麼自旋可以減小無謂的 runtime 排程。推薦看官方 spin commit

  3. 進化成了公平鎖,老版本中當前搶鎖中的 goroutine 大概率比休眠的優先拿到鎖,會產生 latency 長尾。新版本中超過一定時間沒拿到鎖,這個優先順序會反轉,儘可能減小長尾。推薦大家看 #issue 13086,這裡面反映了問題,另外看 commit, 裡面有很詳細的測試資料,值得學習

那麼具體怎麼實現呢?分別以 1.3, 1.7, 1.12 三個版本原始碼為例

Mutex 結構體及常用變數

type Mutex struct {
    state int32
    sema  uint32
}
// 1.3 與 1.7 老的實現共用的常量
const (
    mutexLocked = 1 << iota // mutex is locked
    mutexWoken
    mutexWaiterShift = iota
)
// 1.12 公平鎖使用的常量
const (
    mutexLocked = 1 << iota // mutex is locked
    mutexWoken
    mutexStarving
    mutexWaiterShift = iota
    starvationThresholdNs = 1e6
)

從中可以看到,Mutex 有兩個變數:

  1. state 4 位元組 int, 其中低幾位用於做標記,高位地址空間用於計數,表示有多少個 goroutine 正在等待而處於休眠中。
  2. sema 是一個互斥的訊號量,初始預設值是 0,用於將 goroutine park 休眠或是喚醒。sema acquire 時如果 sema 大於 0,那麼減一返回,否則休眠等待。sema release 將 sema 加一,然後喚醒等待佇列的第一個 goroutine

預設直接使用 sync.Mutex 或是嵌入到結構體中,state 零值代表未上鎖,sema 零值也是有意義的,參考下面原始碼加鎖與解鎖邏輯,稍想下就會明白的。另外參考大鬍子 dave 的關於零值的文章

樸素互斥鎖

樸素是什麼意思呢?就是能用,粗糙...

上鎖

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
    // Fast path: grab unlocked mutex. 快速上鎖,當前 state 為 0,說明沒人鎖。CAS 上鎖後直接返回
    if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
        if raceenabled {
            raceAcquire(unsafe.Pointer(m))
        }
        return
    }

    awoke := false // 被喚醒標記,如果是被別的 goroutine 喚醒的那麼後面會置 true
    for {
        old := m.state // 老的 m.state 值
        new := old | mutexLocked // 新值要置 mutexLocked 位為 1
        if old&mutexLocked != 0 { // 如果 old mutexLocked 位不為 0,那說明有人己經鎖上了,那麼將 state 變數的 waiter 計數部分 +1
            new = old + 1<<mutexWaiterShift
        }
        if awoke {
            // The goroutine has been woken from sleep,
            // so we need to reset the flag in either case. 如果走到這裡 awoke 為 true, 說明是被喚醒的,那麼清除這個 mutexWoken 位,置為 0
            new &^= mutexWoken
        }
        // CAS 更新,如果 m.state 不等於 old,說明有人也在搶鎖,那麼 for 迴圈發起新的一輪競爭。
        if atomic.CompareAndSwapInt32(&m.state, old, new) {
            if old&mutexLocked == 0 { // 如果 old mutexLocked 位為 1,說明當前 CAS 是為了更新 waiter 計數。如果為 0,說明是搶鎖成功,那麼直接 break 退出。
                break
            }
            runtime_Semacquire(&m.sema) // 此時如果 sema <= 0 那麼阻塞在這裡等待喚醒,也就是 park 住。走到這裡都是要休眠了。
            awoke = true  // 有人釋放了鎖,然後當前 goroutine 被 runtime 喚醒了,設定 awoke true
        }
    }

    if raceenabled {
        raceAcquire(unsafe.Pointer(m))
    }
}

上鎖邏輯其實也不難,這裡面更改計數都是用 CAS

  1. fast path 快速上鎖,如果當前 state == 0, 肯定是沒人上鎖,也沒人等待,CAS 更新後直接退出好了
  2. 當前如果有人鎖住了,那麼更新 m.state 值的 waiter 計數部份,然後 runtime_Semacquire 將自己休眠,等待被喚醒
  3. runtime_Semacquire 函式返回說明鎖釋放了,有人將自己喚醒了,那麼設定 awoke,大迴圈發起新的一輪競爭。
  4. 新的競爭到最後,cas 更新了 new 值,此時 old 值 mutexLocked 位肯定為 0,獲取鎖成功,break 退出即可。

解鎖

// Unlock unlocks m.
// It is a run-time error if m is not locked on entry to Unlock.
//
// A locked Mutex is not associated with a particular goroutine.
// It is allowed for one goroutine to lock a Mutex and then
// arrange for another goroutine to unlock it.
func (m *Mutex) Unlock() {
    if raceenabled {
        _ = m.state
        raceRelease(unsafe.Pointer(m))
    }

    // Fast path: drop lock bit. 快速將 state 的 mutexLocked 位清 0,然後 new 返回更新後的值,注意此 add 完成後,很有可能新的 goroutine 搶鎖,並上鎖成功
    new := atomic.AddInt32(&m.state, -mutexLocked)
    if (new+mutexLocked)&mutexLocked == 0 { // 如果釋放了一個己經釋放的鎖,直接 panic
        panic("sync: unlock of unlocked mutex")
    }

    old := new
    for {// 如果 state 變數的 waiter 計數為 0 說明沒人等待鎖,直接 return 就好,同時如果 old 值的 mutexLocked|mutexWoken 任一置 1,說明要麼有人己經搶上了鎖,要麼說明己經有被喚醒的 goroutine 去搶鎖了,沒必要去做通知操作
        // If there are no waiters or a goroutine has already
        // been woken or grabbed the lock, no need to wake anyone.
        if old>>mutexWaiterShift == 0 || old&(mutexLocked|mutexWoken) != 0 {
            return
        }
        // Grab the right to wake someone. 將 waiter 計數位減一,並設定 awoken 位
        new = (old - 1<<mutexWaiterShift) | mutexWoken
        if atomic.CompareAndSwapInt32(&m.state, old, new) {
            runtime_Semrelease(&m.sema) // cas 成功後,再做 sema release 操作,喚醒休眠的 goroutine
            return
        }
        old = m.state
    }
}

解鎖邏輯也不難,注意一個 goroutine 可以釋放別的 goroutine 上的鎖

  1. 原子操作, m.state - mutexLocked, 如果之後 (new+mutexLocked)&mutexLocked 說明釋放了一個沒上鎖的 Mutex,直接 panic
  2. 接下來為什麼是 for 迴圈呢?原因在於,第一步原子操作後,很可能有第三方剛好獲得鎖了,那麼 for 裡面的 CAS 肯定會失敗
  3. 快速判斷,如果 waiter 計數為 0,說明沒有休眠的 goroutine,不用喚醒。如果 old&(mutexLocked|mutexWoken) != 0 說明要麼有人獲得了鎖,要麼己經有 woken 的 goroutine 了,也不用去喚醒。注意這裡,mutexLocked 是 for 迴圈再次判斷時才有的, old 值是迴圈底部重新又獲取得
  4. 然後 CAS 更新成 new 值,設定 woken 標記位,並將等待 waiter 計數減一。最後 runtime_Semrelease 真正的喚醒等待 goroutine

樸素鎖的問題

因獲取 sema 休眠的 goroutine 會以一個 FIFO 的連結串列形式儲存,如果喚醒時可以優先拿到鎖。但是看程式碼的邏輯,處於休眠中的 goroutine 優先順序低於當前活躍的。Unlock 解鎖的順間,最新活躍的 goroutine 是會搶到鎖的。另外有時鎖時間很短,如果沒有自旋 spin 的邏輯,所有 goroutine 都要休眠 park, 徒增 runtime 排程的開銷。

自旋 spin 的優化

後來優化時增加了 spin 邏輯,自旋只存在 Lock 階段,程式碼以 go 1.7 為例

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
    // Fast path: grab unlocked mutex.
    if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
        if race.Enabled {
            race.Acquire(unsafe.Pointer(m))
        }
        return
    }

    awoke := false
    iter := 0
    for {
        old := m.state
        new := old | mutexLocked
        if old&mutexLocked != 0 { // 如果當前己經鎖了,那麼判斷是否可以自旋
            if runtime_canSpin(iter) {
                // Active spinning makes sense.
                // Try to set mutexWoken flag to inform Unlock
                // to not wake other blocked goroutines.
                if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
                    atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
                    awoke = true
                }
                runtime_doSpin()
                iter++
                continue
            }
            new = old + 1<<mutexWaiterShift
        }
        if awoke {
            // The goroutine has been woken from sleep,
            // so we need to reset the flag in either case.
            if new&mutexWoken == 0 {
                panic("sync: inconsistent mutex state")
            }
            new &^= mutexWoken
        }
        if atomic.CompareAndSwapInt32(&m.state, old, new) {
            if old&mutexLocked == 0 {
                break
            }
            runtime_Semacquire(&m.sema)
            awoke = true
            iter = 0
        }
    }

    if race.Enabled {
        race.Acquire(unsafe.Pointer(m))
    }
}

可以看到,for 迴圈開始增加了 spin 判斷邏輯。

  1. 如果 runtime 判斷允許自旋,那麼走 if 邏輯,否則走原有的 Lock 邏輯
  2. 如果當前 m.state 未設定 woken 標記,並且等待 waiter 計數大於 0,說明有人在等待,那麼 CAS 更新 m.state 置位 mutexWoken
  3. 執行 runtime_doSpin 邏輯,同時 iter++ 表示自旋次數
const (
    mutex_unlocked = 0
    mutex_locked   = 1
    mutex_sleeping = 2

    active_spin     = 4
    active_spin_cnt = 30
    passive_spin    = 1
)
// Active spinning for sync.Mutex.
//go:linkname sync_runtime_canSpin sync.runtime_canSpin
//go:nosplit
func sync_runtime_canSpin(i int) bool {
    // sync.Mutex is cooperative, so we are conservative with spinning.
    // Spin only few times and only if running on a multicore machine and
    // GOMAXPROCS>1 and there is at least one other running P and local runq is empty.
    // As opposed to runtime mutex we don't do passive spinning here,
    // because there can be work on global runq on on other Ps.
    if i >= active_spin || ncpu <= 1 || gomaxprocs <= int32(sched.npidle+sched.nmspinning)+1 {
        return false
    }
    if p := getg().m.p.ptr(); !runqempty(p) {
        return false
    }
    return true
}

判斷 runtime_canSpin 是否允許自旋邏輯也簡單,也比較嚴格

  1. iter 不大小最大的 active_spin 次數,預設是 4
  2. 當前機器是多核,並且 GOMAXPROCS > 1,這個很好理解,併發為 1 自旋也沒意義
  3. 最後一個就是當前 P 的本地 runq 佇列為空,如果有待執行的 G,那麼也不允許自旋
func sync_runtime_doSpin() {
    procyield(active_spin_cnt)
}
TEXT runtime·procyield(SB),NOSPLIT,$0-0
    MOVL    cycles+0(FP), AX
again:
    PAUSE
    SUBL    $1, AX
    JNZ again
    RET

自旋程式碼涉及彙編了,在 amd64 平臺呼叫 PAUSE,迴圈 active_spin_cnt 30 次。

公平鎖的實現邏輯

    // Mutex fairness.
    //
    // Mutex can be in 2 modes of operations: normal and starvation.
    // In normal mode waiters are queued in FIFO order, but a woken up waiter
    // does not own the mutex and competes with new arriving goroutines over
    // the ownership. New arriving goroutines have an advantage -- they are
    // already running on CPU and there can be lots of them, so a woken up
    // waiter has good chances of losing. In such case it is queued at front
    // of the wait queue. If a waiter fails to acquire the mutex for more than 1ms,
    // it switches mutex to the starvation mode.
    //
    // In starvation mode ownership of the mutex is directly handed off from
    // the unlocking goroutine to the waiter at the front of the queue.
    // New arriving goroutines don't try to acquire the mutex even if it appears
    // to be unlocked, and don't try to spin. Instead they queue themselves at
    // the tail of the wait queue.
    //
    // If a waiter receives ownership of the mutex and sees that either
    // (1) it is the last waiter in the queue, or (2) it waited for less than 1 ms,
    // it switches mutex back to normal operation mode.
    //
    // Normal mode has considerably better performance as a goroutine can acquire
    // a mutex several times in a row even if there are blocked waiters.
    // Starvation mode is important to prevent pathological cases of tail latency.

程式碼以 go1.12 為例,可以看到註釋關於公平鎖的實現初衷和邏輯。越是基礎元件更新越嚴格,背後肯定有相關測試資料。

  1. Mutex 兩種工作模式,normal 正常模式,starvation 飢餓模式。normal 情況下鎖的邏輯與老版相似,休眠的 goroutine 以 FIFO 連結串列形式儲存在 sudog 中,被喚醒的 goroutine 與新到來活躍的 goroutine 競解,但是很可能會失敗。如果一個 goroutine 等待超過 1ms,那麼 Mutex 進入飢餓模式
  2. 飢餓模式下,解鎖後,鎖直接交給 waiter FIFO 連結串列的第一個,新來的活躍 goroutine 不參與競爭,並放到 FIFO 隊尾
  3. 如果當前獲得鎖的 goroutine 是 FIFO 隊尾,或是等待時長小於 1ms,那麼退出飢餓模式
  4. normal 模式下效能是比較好的,但是 starvation 模式能減小長尾 latency

公平鎖上鎖邏輯

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
    // Fast path: grab unlocked mutex. 快速上鎖邏輯
    if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
        if race.Enabled {
            race.Acquire(unsafe.Pointer(m))
        }
        return
    }

    var waitStartTime int64 // waitStartTime 用於判斷是否需要進入飢餓模式
    starving := false // 飢餓標記
    awoke := false // 是否被喚醒
    iter := 0 // spin 迴圈次數
    old := m.state
    for {
        // Don't spin in starvation mode, ownership is handed off to waiters
        // so we won't be able to acquire the mutex anyway. 飢餓模式下不進行自旋,直接進入阻塞佇列
        if old&(mutexLocked|mutexStarving) == mutexLocked && runtime_canSpin(iter) {
            // Active spinning makes sense.
            // Try to set mutexWoken flag to inform Unlock
            // to not wake other blocked goroutines.
            if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
                atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
                awoke = true
            }
            runtime_doSpin()
            iter++
            old = m.state
            continue
        }
        new := old
        // Don't try to acquire starving mutex, new arriving goroutines must queue.
        if old&mutexStarving == 0 { // 只有此時不是飢餓模式時,才設定 mutexLocked,也就是說飢餓模式下的活躍 goroutine 直接排隊去
            new |= mutexLocked
        }
        if old&(mutexLocked|mutexStarving) != 0 { // 處於己經上鎖或是飢餓時,waiter 計數 + 1
            new += 1 << mutexWaiterShift
        }
        // The current goroutine switches mutex to starvation mode.
        // But if the mutex is currently unlocked, don't do the switch.
        // Unlock expects that starving mutex has waiters, which will not
        // be true in this case. 如果當前處於飢餓模式下,並且己經上鎖了,mutexStarving 置 1,接下來 CAS 會用到
        if starving && old&mutexLocked != 0 {
            new |= mutexStarving
        }
        if awoke { // 如果當前 goroutine 是被喚醒的,然後清 mutexWoken 位
            // The goroutine has been woken from sleep,
            // so we need to reset the flag in either case.
            if new&mutexWoken == 0 {
                throw("sync: inconsistent mutex state")
            }
            new &^= mutexWoken
        }
        if atomic.CompareAndSwapInt32(&m.state, old, new) {
            if old&(mutexLocked|mutexStarving) == 0 { // 如果 old 沒有上鎖並且也不是飢餓模式,上鎖成功直接退出
                break // locked the mutex with CAS
            }
            // If we were already waiting before, queue at the front of the queue.
            queueLifo := waitStartTime != 0 // 第一次 queueLifo 肯定是 false
            if waitStartTime == 0 {
                waitStartTime = runtime_nanotime() 
            }
            runtime_SemacquireMutex(&m.sema, queueLifo) // park 在這裡,如果 queueLifo 為真,那麼扔到隊頭,也就是 LIFO
      // 走到這裡,說明被其它 goroutine 喚醒了,繼續搶鎖時先判斷是否需要進入 starving
            starving = starving || runtime_nanotime()-waitStartTime > starvationThresholdNs // 超過 1ms 就進入飢餓模式
            old = m.state
            if old&mutexStarving != 0 { // 如果原來就是飢餓模式的話,走 if 邏輯
                // If this goroutine was woken and mutex is in starvation mode,
                // ownership was handed off to us but mutex is in somewhat
                // inconsistent state: mutexLocked is not set and we are still
                // accounted as waiter. Fix that.
                if old&(mutexLocked|mutexWoken) != 0 || old>>mutexWaiterShift == 0 {
                    throw("sync: inconsistent mutex state")
                }
        // 此時飢餓模式下被喚醒,那麼一定能上鎖成功。因為 Unlock 保證飢餓模式下只喚醒 park 狀態的 goroutine
                delta := int32(mutexLocked - 1<<mutexWaiterShift) // waiter 計數 -1
                if !starving || old>>mutexWaiterShift == 1 { // 如果是飢餓模式下並且自己是最後一個 waiter ,那麼清除 mutexStarving 標記
                    // Exit starvation mode.
                    // Critical to do it here and consider wait time.
                    // Starvation mode is so inefficient, that two goroutines
                    // can go lock-step infinitely once they switch mutex
                    // to starvation mode.
                    delta -= mutexStarving
                }
                atomic.AddInt32(&m.state, delta) // 更新,搶鎖成功後退出
                break
            }
            awoke = true // 走到這裡,不是飢餓模式,重新發起搶鎖競爭
            iter = 0
        } else {
            old = m.state // CAS 失敗,重新發起競爭
        }
    }

    if race.Enabled {
        race.Acquire(unsafe.Pointer(m))
    }
}

整體來講,公平鎖上鎖邏輯複雜了不少,邊界點要考濾的比較多

  1. 同樣的 fast path 快速上鎖邏輯,原來 m.state 為 0,鎖就完事了
  2. 進入 for 迴圈,也要走自旋邏輯,但是多了一個判斷,如果當前處於飢餓模式禁止自旋,根據實現原理,此時活躍的 goroutine 要直接進入 park 的佇列
  3. 自旋後面的程式碼有四種情況:飢餓搶鎖成功,飢餓搶鎖失敗,正常搶鎖成歷,正常搶鎖失敗。上鎖失敗的最後都要 waiter 計數加一後,更新 CAS
  4. 如果 CAS 失敗,那麼重新發起競爭就好
  5. 如果 CAS 成功,此時要判斷處於何種情況,如果 old 沒上鎖也處於 normal 模式,搶鎖成歷退出
  6. 如果 CAS 成功,但是己經有人上鎖了,那麼要根據 queueLifo 來判斷是扔到 park 隊首還是隊尾,此時當前 goroutine park 在這裡,等待被喚醒
  7. runtime_SemacquireMutex 被喚醒了有兩種情況,判斷是否要進入飢餓模式,如果老的 old 就是飢餓的,那麼自己一定是唯一被喚醒,一定能搶到鎖的,waiter 減一,如果自己是最後一個 waiter 或是飢餓時間小於 starvationThresholdNs 那麼清除 mutexStarving 標記位後退出
  8. 如果老的不是飢餓模式,那麼 awoke 置 true,重新競爭

    公平鎖解鎖邏輯

    // Unlock unlocks m.
    // It is a run-time error if m is not locked on entry to Unlock.
    //
    // A locked Mutex is not associated with a particular goroutine.
    // It is allowed for one goroutine to lock a Mutex and then
    // arrange for another goroutine to unlock it.
    func (m *Mutex) Unlock() {
    if race.Enabled {
        _ = m.state
        race.Release(unsafe.Pointer(m))
    }
    
    // Fast path: drop lock bit. 和原有邏輯一樣,先減去 mutexLocked,並判斷是否解鎖了未上鎖的 Mutex, 直接 panic
    new := atomic.AddInt32(&m.state, -mutexLocked)
    if (new+mutexLocked)&mutexLocked == 0 {
        throw("sync: unlock of unlocked mutex")
    }
    if new&mutexStarving == 0 { // 檢視 mutexStarving 標記位,如果 0 走老邏輯,否則走 starvation 分支
        old := new
        for {
            // If there are no waiters or a goroutine has already
            // been woken or grabbed the lock, no need to wake anyone.
            // In starvation mode ownership is directly handed off from unlocking
            // goroutine to the next waiter. We are not part of this chain,
            // since we did not observe mutexStarving when we unlocked the mutex above.
            // So get off the way.
            if old>>mutexWaiterShift == 0 || old&(mutexLocked|mutexWoken|mutexStarving) != 0 {
                return
            }
            // Grab the right to wake someone.
            new = (old - 1<<mutexWaiterShift) | mutexWoken
            if atomic.CompareAndSwapInt32(&m.state, old, new) {
                runtime_Semrelease(&m.sema, false)
                return
            }
            old = m.state
        }
    } else {
        // Starving mode: handoff mutex ownership to the next waiter.
        // Note: mutexLocked is not set, the waiter will set it after wakeup.
        // But mutex is still considered locked if mutexStarving is set,
        // so new coming goroutines won't acquire it.
        runtime_Semrelease(&m.sema, true) // 直接 runtime_Semrelease 喚醒等待的 goroutine
    }
    }
  9. 原子操作,將 m.state 減去 mutexLocked,然後判斷是否釋放了未上鎖的 Mutex,直接 panic
  10. 根據 m.state 的 mutexStarving 判斷當前處於何種模式,0 走 normal 分支,1 走 starvation 分支
  11. starvation 模式下,直接 runtime_Semrelease 做訊號量 UP 操作,喚醒 FIFO 佇列中的第一個 goroutine
  12. noarmal 模式類似原有邏輯,唯一不同的是多了一個 mutexStarving 位判斷邏輯

總結

還是蠻複雜的,寫寫業務程式碼蠻好,堆堆屎~~

相關文章