深入理解Go-垃圾回收機制

tyloafer發表於2019-08-16

原文網址 : https://segmentfault.com/a/1190000020086769

Go的GC自打出生的時候就開始被人詬病，但是在引入v1.5的三色標記和v1.8的混合寫屏障後，正常的GC已經縮短到10us左右，已經變得非常優秀，了不起了，我們接下來探索一下Go的GC的原理吧

三色標記原理

我們首先看一張圖，大概就會對三色標記法有一個大致的瞭解：

原理：

首先把所有的物件都放到白色的集合中
從根節點開始遍歷物件，遍歷到的白色物件從白色集合中放到灰色集合中
遍歷灰色集合中的物件，把灰色物件引用的白色集合的物件放入到灰色集合中，同時把遍歷過的灰色集合中的物件放到黑色的集合中
迴圈步驟3，知道灰色集合中沒有物件
步驟4結束後，白色集合中的物件就是不可達物件，也就是垃圾，進行回收

寫屏障

Go在進行三色標記的時候並沒有STW，也就是說，此時的物件還是可以進行修改

那麼我們考慮一下，下面的情況

我們在進行三色標記中掃描灰色集合中，掃描到了物件A，並標記了物件A的所有引用，這時候，開始掃描物件D的引用，而此時，另一個goroutine修改了D->E的引用，變成了如下圖所示

這樣會不會導致E物件就掃描不到了，而被誤認為為白色物件，也就是垃圾

寫屏障就是為了解決這樣的問題，引入寫屏障後，在上述步驟後，E會被認為是存活的，即使後面E被A物件拋棄，E會被在下一輪的GC中進行回收，這一輪GC中是不會對物件E進行回收的

Go1.9中開始啟用了混合寫屏障，虛擬碼如下

writePointer(slot, ptr):
    shade(*slot)
    if any stack is grey:
        shade(ptr)
    *slot = ptr

混合寫屏障會同時標記指標寫入目標的"原指標"和“新指標".

標記原指標的原因是, 其他執行中的執行緒有可能會同時把這個指標的值複製到暫存器或者棧上的本地變數
因為複製指標到暫存器或者棧上的本地變數不會經過寫屏障, 所以有可能會導致指標不被標記, 試想下面的情況：

[go] b = obj
[go] oldx = nil
[gc] scan oldx...
[go] oldx = b.x // 複製b.x到本地變數, 不進過寫屏障
[go] b.x = ptr // 寫屏障應該標記b.x的原值
[gc] scan b...
如果寫屏障不標記原值, 那麼oldx就不會被掃描到.

標記新指標的原因是, 其他執行中的執行緒有可能會轉移指標的位置, 試想下面的情況:

[go] a = ptr
[go] b = obj
[gc] scan b...
[go] b.x = a // 寫屏障應該標記b.x的新值
[go] a = nil
[gc] scan a...
如果寫屏障不標記新值, 那麼ptr就不會被掃描到.

混合寫屏障可以讓GC在並行標記結束後不需要重新掃描各個G的堆疊, 可以減少Mark Termination中的STW時間

除了寫屏障外, 在GC的過程中所有新分配的物件都會立刻變為黑色, 在上面的mallocgc函式中可以看到

回收流程

GO的GC是並行GC, 也就是GC的大部分處理和普通的go程式碼是同時執行的, 這讓GO的GC流程比較複雜.
首先GC有四個階段, 它們分別是:

Sweep Termination: 對未清掃的span進行清掃, 只有上一輪的GC的清掃工作完成才可以開始新一輪的GC
Mark: 掃描所有根物件, 和根物件可以到達的所有物件, 標記它們不被回收
Mark Termination: 完成標記工作, 重新掃描部分根物件(要求STW)
Sweep: 按標記結果清掃span

下圖是比較完整的GC流程, 並按顏色對這四個階段進行了分類:

在GC過程中會有兩種後臺任務(G), 一種是標記用的後臺任務, 一種是清掃用的後臺任務.
標記用的後臺任務會在需要時啟動, 可以同時工作的後臺任務數量大約是P的數量的25%, 也就是go所講的讓25%的cpu用在GC上的根據.
清掃用的後臺任務在程式啟動時會啟動一個, 進入清掃階段時喚醒.

目前整個GC流程會進行兩次STW(Stop The World), 第一次是Mark階段的開始, 第二次是Mark Termination階段.
第一次STW會準備根物件的掃描, 啟動寫屏障(Write Barrier)和輔助GC(mutator assist).
第二次STW會重新掃描部分根物件, 禁用寫屏障(Write Barrier)和輔助GC(mutator assist).
需要注意的是, 不是所有根物件的掃描都需要STW, 例如掃描棧上的物件只需要停止擁有該棧的G.
寫屏障的實現使用了Hybrid Write Barrier, 大幅減少了第二次STW的時間.

原始碼分析

gcStart

func gcStart(mode gcMode, trigger gcTrigger) {
    // Since this is called from malloc and malloc is called in
    // the guts of a number of libraries that might be holding
    // locks, don't attempt to start GC in non-preemptible or
    // potentially unstable situations.
    // 判斷當前g是否可以搶佔，不可搶佔時不觸發GC
    mp := acquirem()
    if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
        releasem(mp)
        return
    }
    releasem(mp)
    mp = nil

    // Pick up the remaining unswept/not being swept spans concurrently
    //
    // This shouldn't happen if we're being invoked in background
    // mode since proportional sweep should have just finished
    // sweeping everything, but rounding errors, etc, may leave a
    // few spans unswept. In forced mode, this is necessary since
    // GC can be forced at any point in the sweeping cycle.
    //
    // We check the transition condition continuously here in case
    // this G gets delayed in to the next GC cycle.
    // 清掃 殘留的未清掃的垃圾
    for trigger.test() && gosweepone() != ^uintptr(0) {
        sweep.nbgsweep++
    }

    // Perform GC initialization and the sweep termination
    // transition.
    semacquire(&work.startSema)
    // Re-check transition condition under transition lock.
    // 判斷gcTrriger的條件是否成立
    if !trigger.test() {
        semrelease(&work.startSema)
        return
    }

    // For stats, check if this GC was forced by the user
    // 判斷並記錄GC是否被強制執行的，runtime.GC()可以被使用者呼叫並強制執行
    work.userForced = trigger.kind == gcTriggerAlways || trigger.kind == gcTriggerCycle

    // In gcstoptheworld debug mode, upgrade the mode accordingly.
    // We do this after re-checking the transition condition so
    // that multiple goroutines that detect the heap trigger don't
    // start multiple STW GCs.
    // 設定gc的mode
    if mode == gcBackgroundMode {
        if debug.gcstoptheworld == 1 {
            mode = gcForceMode
        } else if debug.gcstoptheworld == 2 {
            mode = gcForceBlockMode
        }
    }

    // Ok, we're doing it! Stop everybody else
    semacquire(&worldsema)

    if trace.enabled {
        traceGCStart()
    }
    // 啟動後臺標記任務
    if mode == gcBackgroundMode {
        gcBgMarkStartWorkers()
    }
    // 重置gc 標記相關的狀態
    gcResetMarkState()

    work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
    if work.stwprocs > ncpu {
        // This is used to compute CPU time of the STW phases,
        // so it can't be more than ncpu, even if GOMAXPROCS is.
        work.stwprocs = ncpu
    }
    work.heap0 = atomic.Load64(&memstats.heap_live)
    work.pauseNS = 0
    work.mode = mode

    now := nanotime()
    work.tSweepTerm = now
    work.pauseStart = now
    if trace.enabled {
        traceGCSTWStart(1)
    }
    // STW,停止世界
    systemstack(stopTheWorldWithSema)
    // Finish sweep before we start concurrent scan.
    // 先清掃上一輪的垃圾，確保上輪GC完成
    systemstack(func() {
        finishsweep_m()
    })
    // clearpools before we start the GC. If we wait they memory will not be
    // reclaimed until the next GC cycle.
    // 清理 sync.pool sched.sudogcache、sched.deferpool，這裡不展開，sync.pool已經說了，剩餘的後面的文章會涉及
    clearpools()

    // 增加GC技術
    work.cycles++
    if mode == gcBackgroundMode { // Do as much work concurrently as possible
        gcController.startCycle()
        work.heapGoal = memstats.next_gc

        // Enter concurrent mark phase and enable
        // write barriers.
        //
        // Because the world is stopped, all Ps will
        // observe that write barriers are enabled by
        // the time we start the world and begin
        // scanning.
        //
        // Write barriers must be enabled before assists are
        // enabled because they must be enabled before
        // any non-leaf heap objects are marked. Since
        // allocations are blocked until assists can
        // happen, we want enable assists as early as
        // possible.
        // 設定GC的狀態為 gcMark
        setGCPhase(_GCmark)

        // 更新 bgmark 的狀態
        gcBgMarkPrepare() // Must happen before assist enable.
        // 計算並排隊root 掃描任務，並初始化相關掃描任務狀態
        gcMarkRootPrepare()

        // Mark all active tinyalloc blocks. Since we're
        // allocating from these, they need to be black like
        // other allocations. The alternative is to blacken
        // the tiny block on every allocation from it, which
        // would slow down the tiny allocator.
        // 標記 tiny 物件
        gcMarkTinyAllocs()

        // At this point all Ps have enabled the write
        // barrier, thus maintaining the no white to
        // black invariant. Enable mutator assists to
        // put back-pressure on fast allocating
        // mutators.
        // 設定 gcBlackenEnabled 為 1，啟用寫屏障
        atomic.Store(&gcBlackenEnabled, 1)

        // Assists and workers can start the moment we start
        // the world.
        gcController.markStartTime = now

        // Concurrent mark.
        systemstack(func() {
            now = startTheWorldWithSema(trace.enabled)
        })
        work.pauseNS += now - work.pauseStart
        work.tMark = now
    } else {
        // 非並行模式
        // 記錄完成標記階段的開始時間
        if trace.enabled {
            // Switch to mark termination STW.
            traceGCSTWDone()
            traceGCSTWStart(0)
        }
        t := nanotime()
        work.tMark, work.tMarkTerm = t, t
        work.heapGoal = work.heap0

        // Perform mark termination. This will restart the world.
        // stw,進行標記，清掃並start the world
        gcMarkTermination(memstats.triggerRatio)
    }

    semrelease(&work.startSema)
}

gcBgMarkStartWorkers

這個函式準備一些執行bg mark工作的goroutine，但是這些goroutine並不是立即工作的，而是到等到GC的狀態被標記為gcMark 才開始工作，見上個函式的119行

func gcBgMarkStartWorkers() {
    // Background marking is performed by per-P G's. Ensure that
    // each P has a background GC G.
    for _, p := range allp {
        if p.gcBgMarkWorker == 0 {
            go gcBgMarkWorker(p)
            // 等待gcBgMarkWorker goroutine 的 bgMarkReady訊號再繼續
            notetsleepg(&work.bgMarkReady, -1)
            noteclear(&work.bgMarkReady)
        }
    }
}

gcBgMarkWorker

後臺標記任務的函式

func gcBgMarkWorker(_p_ *p) {
    gp := getg()
    // 用於休眠結束後重新獲取p和m
    type parkInfo struct {
        m      muintptr // Release this m on park.
        attach puintptr // If non-nil, attach to this p on park.
    }
    // We pass park to a gopark unlock function, so it can't be on
    // the stack (see gopark). Prevent deadlock from recursively
    // starting GC by disabling preemption.
    gp.m.preemptoff = "GC worker init"
    park := new(parkInfo)
    gp.m.preemptoff = ""
    // 設定park的m和p的資訊，留著後面傳給gopark，在被gcController.findRunnable喚醒的時候，便於找回
    park.m.set(acquirem())
    park.attach.set(_p_)
    // Inform gcBgMarkStartWorkers that this worker is ready.
    // After this point, the background mark worker is scheduled
    // cooperatively by gcController.findRunnable. Hence, it must
    // never be preempted, as this would put it into _Grunnable
    // and put it on a run queue. Instead, when the preempt flag
    // is set, this puts itself into _Gwaiting to be woken up by
    // gcController.findRunnable at the appropriate time.
    // 讓gcBgMarkStartWorkers notetsleepg停止等待並繼續及退出
    notewakeup(&work.bgMarkReady)

    for {
        // Go to sleep until woken by gcController.findRunnable.
        // We can't releasem yet since even the call to gopark
        // may be preempted.
        // 讓g進入休眠
        gopark(func(g *g, parkp unsafe.Pointer) bool {
            park := (*parkInfo)(parkp)

            // The worker G is no longer running, so it's
            // now safe to allow preemption.
            // 釋放當前搶佔的m
            releasem(park.m.ptr())

            // If the worker isn't attached to its P,
            // attach now. During initialization and after
            // a phase change, the worker may have been
            // running on a different P. As soon as we
            // attach, the owner P may schedule the
            // worker, so this must be done after the G is
            // stopped.
            // 設定關聯p，上面已經設定過了
            if park.attach != 0 {
                p := park.attach.ptr()
                park.attach.set(nil)
                // cas the worker because we may be
                // racing with a new worker starting
                // on this P.
                if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
                    // The P got a new worker.
                    // Exit this worker.
                    return false
                }
            }
            return true
        }, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)

        // Loop until the P dies and disassociates this
        // worker (the P may later be reused, in which case
        // it will get a new worker) or we failed to associate.
        // 檢查P的gcBgMarkWorker是否和當前的G一致, 不一致時結束當前的任務
        if _p_.gcBgMarkWorker.ptr() != gp {
            break
        }

        // Disable preemption so we can use the gcw. If the
        // scheduler wants to preempt us, we'll stop draining,
        // dispose the gcw, and then preempt.
        // gopark第一個函式中釋放了m，這裡再搶佔回來
        park.m.set(acquirem())

        if gcBlackenEnabled == 0 {
            throw("gcBgMarkWorker: blackening not enabled")
        }

        startTime := nanotime()
        // 設定gcmark的開始時間
        _p_.gcMarkWorkerStartTime = startTime

        decnwait := atomic.Xadd(&work.nwait, -1)
        if decnwait == work.nproc {
            println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
            throw("work.nwait was > work.nproc")
        }
        // 切換到g0工作
        systemstack(func() {
            // Mark our goroutine preemptible so its stack
            // can be scanned. This lets two mark workers
            // scan each other (otherwise, they would
            // deadlock). We must not modify anything on
            // the G stack. However, stack shrinking is
            // disabled for mark workers, so it is safe to
            // read from the G stack.
            // 設定G的狀態為waiting，以便於另一個g掃描它的棧(兩個g可以互相掃描對方的棧)
            casgstatus(gp, _Grunning, _Gwaiting)
            switch _p_.gcMarkWorkerMode {
            default:
                throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
            case gcMarkWorkerDedicatedMode:
                // 專心執行標記工作的模式
                gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
                if gp.preempt {
                    // 被搶佔了，把所有本地執行佇列中的G放到全域性執行佇列中
                    // We were preempted. This is
                    // a useful signal to kick
                    // everything out of the run
                    // queue so it can run
                    // somewhere else.
                    lock(&sched.lock)
                    for {
                        gp, _ := runqget(_p_)
                        if gp == nil {
                            break
                        }
                        globrunqput(gp)
                    }
                    unlock(&sched.lock)
                }
                // Go back to draining, this time
                // without preemption.
                // 繼續執行標記工作
                gcDrain(&_p_.gcw, gcDrainNoBlock|gcDrainFlushBgCredit)
            case gcMarkWorkerFractionalMode:
                // 執行標記工作，知道被搶佔
                gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
            case gcMarkWorkerIdleMode:
                // 空閒的時候執行標記工作
                gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
            }
            // 把G的waiting狀態轉換到runing狀態
            casgstatus(gp, _Gwaiting, _Grunning)
        })

        // If we are nearing the end of mark, dispose
        // of the cache promptly. We must do this
        // before signaling that we're no longer
        // working so that other workers can't observe
        // no workers and no work while we have this
        // cached, and before we compute done.
        // 及時處理本地快取，上交到全域性的佇列中
        if gcBlackenPromptly {
            _p_.gcw.dispose()
        }

        // Account for time.
        // 累加耗時
        duration := nanotime() - startTime
        switch _p_.gcMarkWorkerMode {
        case gcMarkWorkerDedicatedMode:
            atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
            atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
        case gcMarkWorkerFractionalMode:
            atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
            atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration)
        case gcMarkWorkerIdleMode:
            atomic.Xaddint64(&gcController.idleMarkTime, duration)
        }

        // Was this the last worker and did we run out
        // of work?
        incnwait := atomic.Xadd(&work.nwait, +1)
        if incnwait > work.nproc {
            println("runtime: p.gcMarkWorkerMode=", _p_.gcMarkWorkerMode,
                "work.nwait=", incnwait, "work.nproc=", work.nproc)
            throw("work.nwait > work.nproc")
        }

        // If this worker reached a background mark completion
        // point, signal the main GC goroutine.
        if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
            // Make this G preemptible and disassociate it
            // as the worker for this P so
            // findRunnableGCWorker doesn't try to
            // schedule it.
            // 取消p m的關聯
            _p_.gcBgMarkWorker.set(nil)
            releasem(park.m.ptr())

            gcMarkDone()

            // Disable preemption and prepare to reattach
            // to the P.
            //
            // We may be running on a different P at this
            // point, so we can't reattach until this G is
            // parked.
            park.m.set(acquirem())
            park.attach.set(_p_)
        }
    }
}

gcDrain

三色標記的主要實現

gcDrain掃描所有的roots和物件，並表黑灰色物件，知道所有的roots和物件都被標記

func gcDrain(gcw *gcWork, flags gcDrainFlags) {
    if !writeBarrier.needed {
        throw("gcDrain phase incorrect")
    }

    gp := getg().m.curg
    // 看到搶佔標識是否要返回
    preemptible := flags&gcDrainUntilPreempt != 0
    // 沒有任務時是否要等待任務
    blocking := flags&(gcDrainUntilPreempt|gcDrainIdle|gcDrainFractional|gcDrainNoBlock) == 0
    // 是否計算後臺的掃描量來減少輔助GC和喚醒等待中的G
    flushBgCredit := flags&gcDrainFlushBgCredit != 0
    // 是否在空閒的時候執行標記任務
    idle := flags&gcDrainIdle != 0
    // 記錄初始的已經執行過的掃描任務
    initScanWork := gcw.scanWork

    // checkWork is the scan work before performing the next
    // self-preempt check.
    // 設定對應模式的工作檢查函式
    checkWork := int64(1<<63 - 1)
    var check func() bool
    if flags&(gcDrainIdle|gcDrainFractional) != 0 {
        checkWork = initScanWork + drainCheckThreshold
        if idle {
            check = pollWork
        } else if flags&gcDrainFractional != 0 {
            check = pollFractionalWorkerExit
        }
    }

    // Drain root marking jobs.
    // 如果root物件沒有掃描完，則掃描
    if work.markrootNext < work.markrootJobs {
        for !(preemptible && gp.preempt) {
            job := atomic.Xadd(&work.markrootNext, +1) - 1
            if job >= work.markrootJobs {
                break
            }
            // 執行root掃描任務
            markroot(gcw, job)
            if check != nil && check() {
                goto done
            }
        }
    }

    // Drain heap marking jobs.
    // 迴圈直到被搶佔
    for !(preemptible && gp.preempt) {
        // Try to keep work available on the global queue. We used to
        // check if there were waiting workers, but it's better to
        // just keep work available than to make workers wait. In the
        // worst case, we'll do O(log(_WorkbufSize)) unnecessary
        // balances.
        if work.full == 0 {
            // 平衡工作，如果全域性的標記佇列為空，則分一部分工作到全域性佇列中
            gcw.balance()
        }

        var b uintptr
        if blocking {
            b = gcw.get()
        } else {
            b = gcw.tryGetFast()
            if b == 0 {
                b = gcw.tryGet()
            }
        }
        // 獲取任務失敗，跳出迴圈
        if b == 0 {
            // work barrier reached or tryGet failed.
            break
        }
        // 掃描獲取的到物件
        scanobject(b, gcw)

        // Flush background scan work credit to the global
        // account if we've accumulated enough locally so
        // mutator assists can draw on it.
        // 如果當前掃描的數量超過了 gcCreditSlack，就把掃描的物件數量加到全域性的數量，批量更新
        if gcw.scanWork >= gcCreditSlack {
            atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
            if flushBgCredit {
                gcFlushBgCredit(gcw.scanWork - initScanWork)
                initScanWork = 0
            }
            checkWork -= gcw.scanWork
            gcw.scanWork = 0
            // 如果掃描的物件數量已經達到了 執行下次搶佔的目標數量 checkWork， 則呼叫對應模式的函式
            // idle模式為 pollWork， Fractional模式為 pollFractionalWorkerExit ，在第20行
            if checkWork <= 0 {
                checkWork += drainCheckThreshold
                if check != nil && check() {
                    break
                }
            }
        }
    }

    // In blocking mode, write barriers are not allowed after this
    // point because we must preserve the condition that the work
    // buffers are empty.

done:
    // Flush remaining scan work credit.
    if gcw.scanWork > 0 {
        // 把掃描的物件數量新增到全域性
        atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
        if flushBgCredit {
            gcFlushBgCredit(gcw.scanWork - initScanWork)
        }
        gcw.scanWork = 0
    }
}

markroot

這個被用於根物件掃描

func markroot(gcw *gcWork, i uint32) {
    // TODO(austin): This is a bit ridiculous. Compute and store
    // the bases in gcMarkRootPrepare instead of the counts.
    baseFlushCache := uint32(fixedRootCount)
    baseData := baseFlushCache + uint32(work.nFlushCacheRoots)
    baseBSS := baseData + uint32(work.nDataRoots)
    baseSpans := baseBSS + uint32(work.nBSSRoots)
    baseStacks := baseSpans + uint32(work.nSpanRoots)
    end := baseStacks + uint32(work.nStackRoots)

    // Note: if you add a case here, please also update heapdump.go:dumproots.
    switch {
    // 釋放mcache中的span
    case baseFlushCache <= i && i < baseData:
        flushmcache(int(i - baseFlushCache))
    // 掃描可讀寫的全域性變數
    case baseData <= i && i < baseBSS:
        for _, datap := range activeModules() {
            markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-baseData))
        }
    // 掃描只讀的全域性佇列
    case baseBSS <= i && i < baseSpans:
        for _, datap := range activeModules() {
            markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-baseBSS))
        }
    // 掃描Finalizer佇列
    case i == fixedRootFinalizers:
        // Only do this once per GC cycle since we don't call
        // queuefinalizer during marking.
        if work.markrootDone {
            break
        }
        for fb := allfin; fb != nil; fb = fb.alllink {
            cnt := uintptr(atomic.Load(&fb.cnt))
            scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw)
        }
    // 釋放已經終止的stack
    case i == fixedRootFreeGStacks:
        // Only do this once per GC cycle; preferably
        // concurrently.
        if !work.markrootDone {
            // Switch to the system stack so we can call
            // stackfree.
            systemstack(markrootFreeGStacks)
        }
    // 掃描MSpan.specials
    case baseSpans <= i && i < baseStacks:
        // mark MSpan.specials
        markrootSpans(gcw, int(i-baseSpans))

    default:
        // the rest is scanning goroutine stacks
        // 獲取需要掃描的g
        var gp *g
        if baseStacks <= i && i < end {
            gp = allgs[i-baseStacks]
        } else {
            throw("markroot: bad index")
        }

        // remember when we've first observed the G blocked
        // needed only to output in traceback
        status := readgstatus(gp) // We are not in a scan state
        if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
            gp.waitsince = work.tstart
        }

        // scang must be done on the system stack in case
        // we're trying to scan our own stack.
        // 轉交給g0進行掃描
        systemstack(func() {
            // If this is a self-scan, put the user G in
            // _Gwaiting to prevent self-deadlock. It may
            // already be in _Gwaiting if this is a mark
            // worker or we're in mark termination.
            userG := getg().m.curg
            selfScan := gp == userG && readgstatus(userG) == _Grunning
            // 如果是掃描自己的，則轉換自己的g的狀態
            if selfScan {
                casgstatus(userG, _Grunning, _Gwaiting)
                userG.waitreason = waitReasonGarbageCollectionScan
            }

            // TODO: scang blocks until gp's stack has
            // been scanned, which may take a while for
            // running goroutines. Consider doing this in
            // two phases where the first is non-blocking:
            // we scan the stacks we can and ask running
            // goroutines to scan themselves; and the
            // second blocks.
            // 掃描g的棧
            scang(gp, gcw)

            if selfScan {
                casgstatus(userG, _Gwaiting, _Grunning)
            }
        })
    }
}

markRootBlock

根據 ptrmask0，來掃描[b0, b0+n0)區域

func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) {
    if rootBlockBytes%(8*sys.PtrSize) != 0 {
        // This is necessary to pick byte offsets in ptrmask0.
        throw("rootBlockBytes must be a multiple of 8*ptrSize")
    }

    b := b0 + uintptr(shard)*rootBlockBytes
    // 如果需掃描的block區域，超出b0+n0的區域，直接返回
    if b >= b0+n0 {
        return
    }
    ptrmask := (*uint8)(add(unsafe.Pointer(ptrmask0), uintptr(shard)*(rootBlockBytes/(8*sys.PtrSize))))
    n := uintptr(rootBlockBytes)
    if b+n > b0+n0 {
        n = b0 + n0 - b
    }

    // Scan this shard.
    // 掃描給定block的shard
    scanblock(b, n, ptrmask, gcw)
}

scanblock

func scanblock(b0, n0 uintptr, ptrmask *uint8, gcw *gcWork) {
    // Use local copies of original parameters, so that a stack trace
    // due to one of the throws below shows the original block
    // base and extent.
    b := b0
    n := n0

    for i := uintptr(0); i < n; {
        // Find bits for the next word.
        // 找到bitmap中對應的bits
        bits := uint32(*addb(ptrmask, i/(sys.PtrSize*8)))
        if bits == 0 {
            i += sys.PtrSize * 8
            continue
        }
        for j := 0; j < 8 && i < n; j++ {
            if bits&1 != 0 {
                // 如果該地址包含指標
                // Same work as in scanobject; see comments there.
                obj := *(*uintptr)(unsafe.Pointer(b + i))
                if obj != 0 {
                    // 如果該地址下找到了對應的物件，標灰
                    if obj, span, objIndex := findObject(obj, b, i); obj != 0 {
                        greyobject(obj, b, i, span, gcw, objIndex)
                    }
                }
            }
            bits >>= 1
            i += sys.PtrSize
        }
    }
}

greyobject

標灰物件其實就是找到對應bitmap，標記存活並扔進佇列

func greyobject(obj, base, off uintptr, span *mspan, gcw *gcWork, objIndex uintptr) {
    // obj should be start of allocation, and so must be at least pointer-aligned.
    if obj&(sys.PtrSize-1) != 0 {
        throw("greyobject: obj not pointer-aligned")
    }
    mbits := span.markBitsForIndex(objIndex)

    if useCheckmark {
        // 這裡是用來debug，確保所有的物件都被正確標識
        if !mbits.isMarked() {
            // 這個物件沒有被標記
            printlock()
            print("runtime:greyobject: checkmarks finds unexpected unmarked object obj=", hex(obj), "\n")
            print("runtime: found obj at *(", hex(base), "+", hex(off), ")\n")

            // Dump the source (base) object
            gcDumpObject("base", base, off)

            // Dump the object
            gcDumpObject("obj", obj, ^uintptr(0))

            getg().m.traceback = 2
            throw("checkmark found unmarked object")
        }
        hbits := heapBitsForAddr(obj)
        if hbits.isCheckmarked(span.elemsize) {
            return
        }
        hbits.setCheckmarked(span.elemsize)
        if !hbits.isCheckmarked(span.elemsize) {
            throw("setCheckmarked and isCheckmarked disagree")
        }
    } else {
        if debug.gccheckmark > 0 && span.isFree(objIndex) {
            print("runtime: marking free object ", hex(obj), " found at *(", hex(base), "+", hex(off), ")\n")
            gcDumpObject("base", base, off)
            gcDumpObject("obj", obj, ^uintptr(0))
            getg().m.traceback = 2
            throw("marking free object")
        }

        // If marked we have nothing to do.
        // 物件被正確標記了，無需做其他的操作
        if mbits.isMarked() {
            return
        }
        // mbits.setMarked() // Avoid extra call overhead with manual inlining.
        // 標記物件
        atomic.Or8(mbits.bytep, mbits.mask)
        // If this is a noscan object, fast-track it to black
        // instead of greying it.
        // 如果物件不是指標，則只需要標記，不需要放進佇列，相當於直接標黑
        if span.spanclass.noscan() {
            gcw.bytesMarked += uint64(span.elemsize)
            return
        }
    }

    // Queue the obj for scanning. The PREFETCH(obj) logic has been removed but
    // seems like a nice optimization that can be added back in.
    // There needs to be time between the PREFETCH and the use.
    // Previously we put the obj in an 8 element buffer that is drained at a rate
    // to give the PREFETCH time to do its work.
    // Use of PREFETCHNTA might be more appropriate than PREFETCH
    // 判斷物件是否被放進佇列，沒有則放入，標灰步驟完成
    if !gcw.putFast(obj) {
        gcw.put(obj)
    }
}

gcWork.putFast

work有wbuf1 wbuf2兩個佇列用於儲存灰色物件，首先會往wbuf1佇列里加入灰色物件，wbuf1滿了後，交換wbuf1和wbuf2，這事wbuf2便晉升為wbuf1，繼續存放灰色物件，兩個佇列都滿了，則想全域性進行申請

putFast這裡進嘗試將物件放進wbuf1佇列中

func (w *gcWork) putFast(obj uintptr) bool {
    wbuf := w.wbuf1
    if wbuf == nil {
        // 沒有申請快取佇列，返回false
        return false
    } else if wbuf.nobj == len(wbuf.obj) {
        // wbuf1佇列滿了，返回false
        return false
    }

    // 向未滿wbuf1佇列中加入物件
    wbuf.obj[wbuf.nobj] = obj
    wbuf.nobj++
    return true
}

gcWork.put

put不僅嘗試將物件放入wbuf1，還會再wbuf1滿的時候，嘗試更換wbuf1 wbuf2的角色，都滿的話，則想全域性進行申請，並將滿的佇列上交到全域性佇列

func (w *gcWork) put(obj uintptr) {
    flushed := false
    wbuf := w.wbuf1
    if wbuf == nil {
        // 如果wbuf1不存在，則初始化wbuf1 wbuf2兩個佇列
        w.init()
        wbuf = w.wbuf1
        // wbuf is empty at this point.
    } else if wbuf.nobj == len(wbuf.obj) {
        // wbuf1滿了，更換wbuf1 wbuf2的角色
        w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
        wbuf = w.wbuf1
        if wbuf.nobj == len(wbuf.obj) {
            // 更換角色後，wbuf1也滿了，說明兩個佇列都滿了
            // 把 wbuf1上交全域性並獲取一個空的佇列
            putfull(wbuf)
            wbuf = getempty()
            w.wbuf1 = wbuf
            // 設定佇列上交的標誌位
            flushed = true
        }
    }

    wbuf.obj[wbuf.nobj] = obj
    wbuf.nobj++

    // If we put a buffer on full, let the GC controller know so
    // it can encourage more workers to run. We delay this until
    // the end of put so that w is in a consistent state, since
    // enlistWorker may itself manipulate w.
    // 此時全域性已經有標記滿的佇列，GC controller選擇排程更多work進行工作
    if flushed && gcphase == _GCmark {
        gcController.enlistWorker()
    }
}

到這裡，接下來，我們繼續分析gcDrain裡面的函式，追蹤一下，我們標灰的物件是如何被標黑的

gcw.balance()

繼續分析 gcDrain的58行，balance work是什麼

func (w *gcWork) balance() {
    if w.wbuf1 == nil {
        // 這裡wbuf1 wbuf2佇列還沒有初始化
        return
    }
    // 如果wbuf2不為空，則上交到全域性，並獲取一個空島佇列給wbuf2
    if wbuf := w.wbuf2; wbuf.nobj != 0 {
        putfull(wbuf)
        w.wbuf2 = getempty()
    } else if wbuf := w.wbuf1; wbuf.nobj > 4 {
        // 把未滿的wbuf1分成兩半，並把其中一半上交的全域性佇列
        w.wbuf1 = handoff(wbuf)
    } else {
        return
    }
    // We flushed a buffer to the full list, so wake a worker.
    // 這裡，全域性佇列有滿的佇列了，其他work可以工作了
    if gcphase == _GCmark {
        gcController.enlistWorker()
    }
}

gcw.get()

繼續分析 gcDrain的63行，這裡就是首先從本地的佇列獲取一個物件，如果本地佇列的wbuf1沒有，嘗試從wbuf2獲取，如果兩個都沒有，則嘗試從全域性佇列獲取一個滿的佇列，並獲取一個物件

func (w *gcWork) get() uintptr {
    wbuf := w.wbuf1
    if wbuf == nil {
        w.init()
        wbuf = w.wbuf1
        // wbuf is empty at this point.
    }
    if wbuf.nobj == 0 {
        // wbuf1空了，更換wbuf1 wbuf2的角色
        w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
        wbuf = w.wbuf1
        // 原wbuf2也是空的，嘗試從全域性佇列獲取一個滿的佇列
        if wbuf.nobj == 0 {
            owbuf := wbuf
            wbuf = getfull()
            // 獲取不到，則返回
            if wbuf == nil {
                return 0
            }
            // 把空的佇列上傳到全域性空佇列，並把獲取的滿的佇列，作為自身的wbuf1
            putempty(owbuf)
            w.wbuf1 = wbuf
        }
    }

    // TODO: This might be a good place to add prefetch code

    wbuf.nobj--
    return wbuf.obj[wbuf.nobj]
}

gcw.tryGet() gcw.tryGetFast() 邏輯差不多，相對比較簡單，就不繼續分析了

scanobject

我們繼續分析到 gcDrain 的L76，這裡已經獲取到了b，開始消費佇列

func scanobject(b uintptr, gcw *gcWork) {
    // Find the bits for b and the size of the object at b.
    //
    // b is either the beginning of an object, in which case this
    // is the size of the object to scan, or it points to an
    // oblet, in which case we compute the size to scan below.
    // 獲取b對應的bits
    hbits := heapBitsForAddr(b)
    // 獲取b所在的span
    s := spanOfUnchecked(b)
    n := s.elemsize
    if n == 0 {
        throw("scanobject n == 0")
    }
    // 物件過大，則切割後再掃描，maxObletBytes為128k
    if n > maxObletBytes {
        // Large object. Break into oblets for better
        // parallelism and lower latency.
        if b == s.base() {
            // It's possible this is a noscan object (not
            // from greyobject, but from other code
            // paths), in which case we must *not* enqueue
            // oblets since their bitmaps will be
            // uninitialized.
            // 如果不是指標，直接標記返回，相當於標黑了
            if s.spanclass.noscan() {
                // Bypass the whole scan.
                gcw.bytesMarked += uint64(n)
                return
            }

            // Enqueue the other oblets to scan later.
            // Some oblets may be in b's scalar tail, but
            // these will be marked as "no more pointers",
            // so we'll drop out immediately when we go to
            // scan those.
            // 按maxObletBytes切割後放入到 佇列
            for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
                if !gcw.putFast(oblet) {
                    gcw.put(oblet)
                }
            }
        }

        // Compute the size of the oblet. Since this object
        // must be a large object, s.base() is the beginning
        // of the object.
        n = s.base() + s.elemsize - b
        if n > maxObletBytes {
            n = maxObletBytes
        }
    }

    var i uintptr
    for i = 0; i < n; i += sys.PtrSize {
        // Find bits for this word.
        // 獲取到對應的bits
        if i != 0 {
            // Avoid needless hbits.next() on last iteration.
            hbits = hbits.next()
        }
        // Load bits once. See CL 22712 and issue 16973 for discussion.
        bits := hbits.bits()
        // During checkmarking, 1-word objects store the checkmark
        // in the type bit for the one word. The only one-word objects
        // are pointers, or else they'd be merged with other non-pointer
        // data into larger allocations.
        if i != 1*sys.PtrSize && bits&bitScan == 0 {
            break // no more pointers in this object
        }
        // 不是指標，繼續
        if bits&bitPointer == 0 {
            continue // not a pointer
        }

        // Work here is duplicated in scanblock and above.
        // If you make changes here, make changes there too.
        obj := *(*uintptr)(unsafe.Pointer(b + i))

        // At this point we have extracted the next potential pointer.
        // Quickly filter out nil and pointers back to the current object.
        if obj != 0 && obj-b >= n {
            // Test if obj points into the Go heap and, if so,
            // mark the object.
            //
            // Note that it's possible for findObject to
            // fail if obj points to a just-allocated heap
            // object because of a race with growing the
            // heap. In this case, we know the object was
            // just allocated and hence will be marked by
            // allocation itself.
            // 找到指標對應的物件，並標灰
            if obj, span, objIndex := findObject(obj, b, i); obj != 0 {
                greyobject(obj, b, i, span, gcw, objIndex)
            }
        }
    }
    gcw.bytesMarked += uint64(n)
    gcw.scanWork += int64(i)
}

綜上，我們可以發現，標灰就是標記並放進佇列，標黑就是標記，所以當灰色物件從佇列中取出後，我們就可以認為這個物件是黑色物件了

至此，gcDrain的標記工作分析完成，我們繼續回到gcBgMarkWorker分析

gcMarkDone

gcMarkDone會將mark1階段進入到mark2階段， mark2階段進入到mark termination階段

mark1階段：包括所有root標記，全域性快取佇列和本地快取佇列

mark2階段：本地快取佇列會被禁用

func gcMarkDone() {
top:
    semacquire(&work.markDoneSema)

    // Re-check transition condition under transition lock.
    if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
        semrelease(&work.markDoneSema)
        return
    }

    // Disallow starting new workers so that any remaining workers
    // in the current mark phase will drain out.
    //
    // TODO(austin): Should dedicated workers keep an eye on this
    // and exit gcDrain promptly?
    // 禁止新的標記任務
    atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, -0xffffffff)
    prevFractionalGoal := gcController.fractionalUtilizationGoal
    gcController.fractionalUtilizationGoal = 0

    // 如果gcBlackenPromptly表名需要所有本地快取佇列立即上交到全域性佇列，並禁用本地快取佇列
    if !gcBlackenPromptly {
        // Transition from mark 1 to mark 2.
        //
        // The global work list is empty, but there can still be work
        // sitting in the per-P work caches.
        // Flush and disable work caches.

        // Disallow caching workbufs and indicate that we're in mark 2.
        // 禁用本地快取佇列，進入mark2階段
        gcBlackenPromptly = true

        // Prevent completion of mark 2 until we've flushed
        // cached workbufs.
        atomic.Xadd(&work.nwait, -1)

        // GC is set up for mark 2. Let Gs blocked on the
        // transition lock go while we flush caches.
        semrelease(&work.markDoneSema)
        // 切換到g0執行，本地快取上傳到全域性的操作
        systemstack(func() {
            // Flush all currently cached workbufs and
            // ensure all Ps see gcBlackenPromptly. This
            // also blocks until any remaining mark 1
            // workers have exited their loop so we can
            // start new mark 2 workers.
            forEachP(func(_p_ *p) {
                wbBufFlush1(_p_)
                _p_.gcw.dispose()
            })
        })

        // Check that roots are marked. We should be able to
        // do this before the forEachP, but based on issue
        // #16083 there may be a (harmless) race where we can
        // enter mark 2 while some workers are still scanning
        // stacks. The forEachP ensures these scans are done.
        //
        // TODO(austin): Figure out the race and fix this
        // properly.
        // 檢查所有的root是否都被標記了
        gcMarkRootCheck()

        // Now we can start up mark 2 workers.
        atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 0xffffffff)
        gcController.fractionalUtilizationGoal = prevFractionalGoal

        incnwait := atomic.Xadd(&work.nwait, +1)
        // 如果沒有更多的任務，則執行第二次呼叫，從mark2階段轉換到mark termination階段
        if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
            // This loop will make progress because
            // gcBlackenPromptly is now true, so it won't
            // take this same "if" branch.
            goto top
        }
    } else {
        // Transition to mark termination.
        now := nanotime()
        work.tMarkTerm = now
        work.pauseStart = now
        getg().m.preemptoff = "gcing"
        if trace.enabled {
            traceGCSTWStart(0)
        }
        systemstack(stopTheWorldWithSema)
        // The gcphase is _GCmark, it will transition to _GCmarktermination
        // below. The important thing is that the wb remains active until
        // all marking is complete. This includes writes made by the GC.

        // Record that one root marking pass has completed.
        work.markrootDone = true

        // Disable assists and background workers. We must do
        // this before waking blocked assists.
        atomic.Store(&gcBlackenEnabled, 0)

        // Wake all blocked assists. These will run when we
        // start the world again.
        // 喚醒所有的輔助GC
        gcWakeAllAssists()

        // Likewise, release the transition lock. Blocked
        // workers and assists will run when we start the
        // world again.
        semrelease(&work.markDoneSema)

        // endCycle depends on all gcWork cache stats being
        // flushed. This is ensured by mark 2.
        // 計算下一次gc出發的閾值
        nextTriggerRatio := gcController.endCycle()

        // Perform mark termination. This will restart the world.
        // start the world，並進入完成階段
        gcMarkTermination(nextTriggerRatio)
    }
}

gcMarkTermination

結束標記，並進行清掃等工作

func gcMarkTermination(nextTriggerRatio float64) {
    // World is stopped.
    // Start marktermination which includes enabling the write barrier.
    atomic.Store(&gcBlackenEnabled, 0)
    gcBlackenPromptly = false
    // 設定GC的階段標識
    setGCPhase(_GCmarktermination)

    work.heap1 = memstats.heap_live
    startTime := nanotime()

    mp := acquirem()
    mp.preemptoff = "gcing"
    _g_ := getg()
    _g_.m.traceback = 2
    gp := _g_.m.curg
    // 設定當前g的狀態為waiting狀態
    casgstatus(gp, _Grunning, _Gwaiting)
    gp.waitreason = waitReasonGarbageCollection

    // Run gc on the g0 stack. We do this so that the g stack
    // we're currently running on will no longer change. Cuts
    // the root set down a bit (g0 stacks are not scanned, and
    // we don't need to scan gc's internal state).  We also
    // need to switch to g0 so we can shrink the stack.
    systemstack(func() {
        // 通過g0掃描當前g的棧
        gcMark(startTime)
        // Must return immediately.
        // The outer function's stack may have moved
        // during gcMark (it shrinks stacks, including the
        // outer function's stack), so we must not refer
        // to any of its variables. Return back to the
        // non-system stack to pick up the new addresses
        // before continuing.
    })

    systemstack(func() {
        work.heap2 = work.bytesMarked
        if debug.gccheckmark > 0 {
            // Run a full stop-the-world mark using checkmark bits,
            // to check that we didn't forget to mark anything during
            // the concurrent mark process.
            // 如果啟用了gccheckmark，則檢查所有可達物件是否都有標記
            gcResetMarkState()
            initCheckmarks()
            gcMark(startTime)
            clearCheckmarks()
        }

        // marking is complete so we can turn the write barrier off
        // 設定gc的階段標識，GCoff時會關閉寫屏障
        setGCPhase(_GCoff)
        // 開始清掃
        gcSweep(work.mode)

        if debug.gctrace > 1 {
            startTime = nanotime()
            // The g stacks have been scanned so
            // they have gcscanvalid==true and gcworkdone==true.
            // Reset these so that all stacks will be rescanned.
            gcResetMarkState()
            finishsweep_m()

            // Still in STW but gcphase is _GCoff, reset to _GCmarktermination
            // At this point all objects will be found during the gcMark which
            // does a complete STW mark and object scan.
            setGCPhase(_GCmarktermination)
            gcMark(startTime)
            setGCPhase(_GCoff) // marking is done, turn off wb.
            gcSweep(work.mode)
        }
    })

    _g_.m.traceback = 0
    casgstatus(gp, _Gwaiting, _Grunning)

    if trace.enabled {
        traceGCDone()
    }

    // all done
    mp.preemptoff = ""

    if gcphase != _GCoff {
        throw("gc done but gcphase != _GCoff")
    }

    // Update GC trigger and pacing for the next cycle.
    // 更新下次出發gc的增長比
    gcSetTriggerRatio(nextTriggerRatio)

    // Update timing memstats
    // 更新用時
    now := nanotime()
    sec, nsec, _ := time_now()
    unixNow := sec*1e9 + int64(nsec)
    work.pauseNS += now - work.pauseStart
    work.tEnd = now
    atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
    atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
    memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
    memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
    memstats.pause_total_ns += uint64(work.pauseNS)

    // Update work.totaltime.
    sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm)
    // We report idle marking time below, but omit it from the
    // overall utilization here since it's "free".
    markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime
    markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm)
    cycleCpu := sweepTermCpu + markCpu + markTermCpu
    work.totaltime += cycleCpu

    // Compute overall GC CPU utilization.
    totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs)
    memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu)

    // Reset sweep state.
    // 重置清掃的狀態
    sweep.nbgsweep = 0
    sweep.npausesweep = 0

    // 如果是強制開啟的gc，標識增加
    if work.userForced {
        memstats.numforcedgc++
    }

    // Bump GC cycle count and wake goroutines waiting on sweep.
    // 統計執行GC的次數然後喚醒等待清掃的G
    lock(&work.sweepWaiters.lock)
    memstats.numgc++
    injectglist(work.sweepWaiters.head.ptr())
    work.sweepWaiters.head = 0
    unlock(&work.sweepWaiters.lock)

    // Finish the current heap profiling cycle and start a new
    // heap profiling cycle. We do this before starting the world
    // so events don't leak into the wrong cycle.
    mProf_NextCycle()
    // start the world
    systemstack(func() { startTheWorldWithSema(true) })

    // Flush the heap profile so we can start a new cycle next GC.
    // This is relatively expensive, so we don't do it with the
    // world stopped.
    mProf_Flush()

    // Prepare workbufs for freeing by the sweeper. We do this
    // asynchronously because it can take non-trivial time.
    prepareFreeWorkbufs()

    // Free stack spans. This must be done between GC cycles.
    systemstack(freeStackSpans)

    // Print gctrace before dropping worldsema. As soon as we drop
    // worldsema another cycle could start and smash the stats
    // we're trying to print.
    if debug.gctrace > 0 {
        util := int(memstats.gc_cpu_fraction * 100)

        var sbuf [24]byte
        printlock()
        print("gc ", memstats.numgc,
            " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ",
            util, "%: ")
        prev := work.tSweepTerm
        for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
            if i != 0 {
                print("+")
            }
            print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
            prev = ns
        }
        print(" ms clock, ")
        for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} {
            if i == 2 || i == 3 {
                // Separate mark time components with /.
                print("/")
            } else if i != 0 {
                print("+")
            }
            print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
        }
        print(" ms cpu, ",
            work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ",
            work.heapGoal>>20, " MB goal, ",
            work.maxprocs, " P")
        if work.userForced {
            print(" (forced)")
        }
        print("\n")
        printunlock()
    }

    semrelease(&worldsema)
    // Careful: another GC cycle may start now.

    releasem(mp)
    mp = nil

    // now that gc is done, kick off finalizer thread if needed
    // 如果不是並行GC，則讓當前M開始排程
    if !concurrentSweep {
        // give the queued finalizers, if any, a chance to run
        Gosched()
    }
}

goSweep

清掃任務

func gcSweep(mode gcMode) {
    if gcphase != _GCoff {
        throw("gcSweep being done but phase is not GCoff")
    }

    lock(&mheap_.lock)
    // sweepgen在每次GC之後都會增長2，每次GC之後sweepSpans的角色都會互換
    mheap_.sweepgen += 2
    mheap_.sweepdone = 0
    if mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 {
        // We should have drained this list during the last
        // sweep phase. We certainly need to start this phase
        // with an empty swept list.
        throw("non-empty swept list")
    }
    mheap_.pagesSwept = 0
    unlock(&mheap_.lock)
    // 如果不是並行GC，或者強制GC
    if !_ConcurrentSweep || mode == gcForceBlockMode {
        // Special case synchronous sweep.
        // Record that no proportional sweeping has to happen.
        lock(&mheap_.lock)
        mheap_.sweepPagesPerByte = 0
        unlock(&mheap_.lock)
        // Sweep all spans eagerly.
        // 清掃所有的span
        for sweepone() != ^uintptr(0) {
            sweep.npausesweep++
        }
        // Free workbufs eagerly.
        // 釋放所有的 workbufs
        prepareFreeWorkbufs()
        for freeSomeWbufs(false) {
        }
        // All "free" events for this mark/sweep cycle have
        // now happened, so we can make this profile cycle
        // available immediately.
        mProf_NextCycle()
        mProf_Flush()
        return
    }

    // Background sweep.
    lock(&sweep.lock)
    // 喚醒後臺清掃任務,也就是 bgsweep 函式，清掃流程跟上面非並行清掃差不多
    if sweep.parked {
        sweep.parked = false
        ready(sweep.g, 0, true)
    }
    unlock(&sweep.lock)
}

sweepone

接下來我們就分析一下sweepone 清掃的流程

func sweepone() uintptr {
    _g_ := getg()
    sweepRatio := mheap_.sweepPagesPerByte // For debugging

    // increment locks to ensure that the goroutine is not preempted
    // in the middle of sweep thus leaving the span in an inconsistent state for next GC
    _g_.m.locks++
    // 檢查是否已經完成了清掃
    if atomic.Load(&mheap_.sweepdone) != 0 {
        _g_.m.locks--
        return ^uintptr(0)
    }
    // 增加清掃的worker數量
    atomic.Xadd(&mheap_.sweepers, +1)

    npages := ^uintptr(0)
    sg := mheap_.sweepgen
    for {
        // 迴圈獲取需要清掃的span
        s := mheap_.sweepSpans[1-sg/2%2].pop()
        if s == nil {
            atomic.Store(&mheap_.sweepdone, 1)
            break
        }
        if s.state != mSpanInUse {
            // This can happen if direct sweeping already
            // swept this span, but in that case the sweep
            // generation should always be up-to-date.
            if s.sweepgen != sg {
                print("runtime: bad span s.state=", s.state, " s.sweepgen=", s.sweepgen, " sweepgen=", sg, "\n")
                throw("non in-use span in unswept list")
            }
            continue
        }
        // sweepgen == h->sweepgen - 2, 表示這個span需要清掃
        // sweepgen == h->sweepgen - 1, 表示這個span正在被清掃
        // 這是裡確定span的狀態及嘗試轉換span的狀態
        if s.sweepgen != sg-2 || !atomic.Cas(&s.sweepgen, sg-2, sg-1) {
            continue
        }
        npages = s.npages
        // 單個span的清掃
        if !s.sweep(false) {
            // Span is still in-use, so this returned no
            // pages to the heap and the span needs to
            // move to the swept in-use list.
            npages = 0
        }
        break
    }

    // Decrement the number of active sweepers and if this is the
    // last one print trace information.
    // 當前worker清掃任務完成，更新sweepers的數量
    if atomic.Xadd(&mheap_.sweepers, -1) == 0 && atomic.Load(&mheap_.sweepdone) != 0 {
        if debug.gcpacertrace > 0 {
            print("pacer: sweep done at heap size ", memstats.heap_live>>20, "MB; allocated ", (memstats.heap_live-mheap_.sweepHeapLiveBasis)>>20, "MB during sweep; swept ", mheap_.pagesSwept, " pages at ", sweepRatio, " pages/byte\n")
        }
    }
    _g_.m.locks--
    return npages
}

mspan.sweep

func (s *mspan) sweep(preserve bool) bool {
    // It's critical that we enter this function with preemption disabled,
    // GC must not start while we are in the middle of this function.
    _g_ := getg()
    if _g_.m.locks == 0 && _g_.m.mallocing == 0 && _g_ != _g_.m.g0 {
        throw("MSpan_Sweep: m is not locked")
    }
    sweepgen := mheap_.sweepgen
    // 只有正在清掃中狀態的span才可以正常執行
    if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
        print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
        throw("MSpan_Sweep: bad span state")
    }

    if trace.enabled {
        traceGCSweepSpan(s.npages * _PageSize)
    }
    // 先更新清掃的page數
    atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))

    spc := s.spanclass
    size := s.elemsize
    res := false

    c := _g_.m.mcache
    freeToHeap := false

    // The allocBits indicate which unmarked objects don't need to be
    // processed since they were free at the end of the last GC cycle
    // and were not allocated since then.
    // If the allocBits index is >= s.freeindex and the bit
    // is not marked then the object remains unallocated
    // since the last GC.
    // This situation is analogous to being on a freelist.

    // Unlink & free special records for any objects we're about to free.
    // Two complications here:
    // 1. An object can have both finalizer and profile special records.
    //    In such case we need to queue finalizer for execution,
    //    mark the object as live and preserve the profile special.
    // 2. A tiny object can have several finalizers setup for different offsets.
    //    If such object is not marked, we need to queue all finalizers at once.
    // Both 1 and 2 are possible at the same time.
    specialp := &s.specials
    special := *specialp
    // 判斷在special中的物件是否存活，是否至少有一個finalizer，釋放沒有finalizer的物件，把有finalizer的物件組成佇列
    for special != nil {
        // A finalizer can be set for an inner byte of an object, find object beginning.
        objIndex := uintptr(special.offset) / size
        p := s.base() + objIndex*size
        mbits := s.markBitsForIndex(objIndex)
        if !mbits.isMarked() {
            // This object is not marked and has at least one special record.
            // Pass 1: see if it has at least one finalizer.
            hasFin := false
            endOffset := p - s.base() + size
            for tmp := special; tmp != nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
                if tmp.kind == _KindSpecialFinalizer {
                    // Stop freeing of object if it has a finalizer.
                    mbits.setMarkedNonAtomic()
                    hasFin = true
                    break
                }
            }
            // Pass 2: queue all finalizers _or_ handle profile record.
            for special != nil && uintptr(special.offset) < endOffset {
                // Find the exact byte for which the special was setup
                // (as opposed to object beginning).
                p := s.base() + uintptr(special.offset)
                if special.kind == _KindSpecialFinalizer || !hasFin {
                    // Splice out special record.
                    y := special
                    special = special.next
                    *specialp = special
                    freespecial(y, unsafe.Pointer(p), size)
                } else {
                    // This is profile record, but the object has finalizers (so kept alive).
                    // Keep special record.
                    specialp = &special.next
                    special = *specialp
                }
            }
        } else {
            // object is still live: keep special record
            specialp = &special.next
            special = *specialp
        }
    }

    if debug.allocfreetrace != 0 || raceenabled || msanenabled {
        // Find all newly freed objects. This doesn't have to
        // efficient; allocfreetrace has massive overhead.
        mbits := s.markBitsForBase()
        abits := s.allocBitsForIndex(0)
        for i := uintptr(0); i < s.nelems; i++ {
            if !mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) {
                x := s.base() + i*s.elemsize
                if debug.allocfreetrace != 0 {
                    tracefree(unsafe.Pointer(x), size)
                }
                if raceenabled {
                    racefree(unsafe.Pointer(x), size)
                }
                if msanenabled {
                    msanfree(unsafe.Pointer(x), size)
                }
            }
            mbits.advance()
            abits.advance()
        }
    }

    // Count the number of free objects in this span.
    // 獲取需要釋放的alloc物件的總數
    nalloc := uint16(s.countAlloc())
    // 如果sizeclass為0，卻分配的總數量為0，則釋放到mheap
    if spc.sizeclass() == 0 && nalloc == 0 {
        s.needzero = 1
        freeToHeap = true
    }
    nfreed := s.allocCount - nalloc
    if nalloc > s.allocCount {
        print("runtime: nelems=", s.nelems, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
        throw("sweep increased allocation count")
    }

    s.allocCount = nalloc
    // 判斷span是否empty
    wasempty := s.nextFreeIndex() == s.nelems
    // 重置freeindex
    s.freeindex = 0 // reset allocation index to start of span.
    if trace.enabled {
        getg().m.p.ptr().traceReclaimed += uintptr(nfreed) * s.elemsize
    }

    // gcmarkBits becomes the allocBits.
    // get a fresh cleared gcmarkBits in preparation for next GC
    // 重置 allocBits為 gcMarkBits
    s.allocBits = s.gcmarkBits
    // 重置 gcMarkBits
    s.gcmarkBits = newMarkBits(s.nelems)

    // Initialize alloc bits cache.
    // 更新allocCache
    s.refillAllocCache(0)

    // We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
    // because of the potential for a concurrent free/SetFinalizer.
    // But we need to set it before we make the span available for allocation
    // (return it to heap or mcentral), because allocation code assumes that a
    // span is already swept if available for allocation.
    if freeToHeap || nfreed == 0 {
        // The span must be in our exclusive ownership until we update sweepgen,
        // check for potential races.
        if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
            print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
            throw("MSpan_Sweep: bad span state after sweep")
        }
        // Serialization point.
        // At this point the mark bits are cleared and allocation ready
        // to go so release the span.
        atomic.Store(&s.sweepgen, sweepgen)
    }

    if nfreed > 0 && spc.sizeclass() != 0 {
        c.local_nsmallfree[spc.sizeclass()] += uintptr(nfreed)
        // 把span釋放到mcentral上
        res = mheap_.central[spc].mcentral.freeSpan(s, preserve, wasempty)
        // MCentral_FreeSpan updates sweepgen
    } else if freeToHeap {
        // 這裡是大物件的span釋放，與117行呼應
        // Free large span to heap

        // NOTE(rsc,dvyukov): The original implementation of efence
        // in CL 22060046 used SysFree instead of SysFault, so that
        // the operating system would eventually give the memory
        // back to us again, so that an efence program could run
        // longer without running out of memory. Unfortunately,
        // calling SysFree here without any kind of adjustment of the
        // heap data structures means that when the memory does
        // come back to us, we have the wrong metadata for it, either in
        // the MSpan structures or in the garbage collection bitmap.
        // Using SysFault here means that the program will run out of
        // memory fairly quickly in efence mode, but at least it won't
        // have mysterious crashes due to confused memory reuse.
        // It should be possible to switch back to SysFree if we also
        // implement and then call some kind of MHeap_DeleteSpan.
        if debug.efence > 0 {
            s.limit = 0 // prevent mlookup from finding this span
            sysFault(unsafe.Pointer(s.base()), size)
        } else {
            // 把sapn釋放到mheap上
            mheap_.freeSpan(s, 1)
        }
        c.local_nlargefree++
        c.local_largefree += size
        res = true
    }
    if !res {
        // The span has been swept and is still in-use, so put
        // it on the swept in-use list.
        // 如果span未釋放到mcentral或mheap，表示span仍然處於in-use狀態
        mheap_.sweepSpans[sweepgen/2%2].push(s)
    }
    return res
}

ok，至此Go的GC流程已經分析完成了，結合最上面開始的圖，可能會容易理解一點

參考文件

深入理解 JVM 之垃圾回收機制
2019-02-25
JVM
深入理解Java的垃圾回收機制（GC）實現原理
2024-09-23
JavaGC
深入理解之V8引擎的垃圾回收機制
2022-07-07
深入理解虛擬機器之垃圾回收
2018-04-28
虛擬機
【深入理解Java虛擬機器】垃圾回收
2021-03-06
Java虛擬機
深入理解JVM記憶體回收機制（不包含垃圾收集器）
2020-07-16
JVM記憶體
java垃圾回收機制
2019-03-03
Java
js垃圾回收機制
2019-04-16
JS
javascript 垃圾回收機制
2018-08-06
JavaScript
Python垃圾回收機制
2020-05-21
Python
JVM 垃圾回收機制
2019-01-05
JVM
JVM垃圾回收機制
2021-11-11
JVM
Java 垃圾回收機制
2022-09-15
Java
深入理解Java虛擬機器 --- 垃圾回收器
2024-11-08
Java虛擬機
《深入理解JVM》10-垃圾回收
2020-10-15
JVM
深入理解Java虛擬機器之垃圾回收篇
2021-10-18
Java虛擬機
剖析垃圾回收機制（上）
2020-07-28
java垃圾回收機制整理
2019-07-01
Java
JS的垃圾回收機制
2024-08-01
JS
jvm的垃圾回收機制
2022-02-16
JVM
JavaScript的垃圾回收機制
2021-06-06
JavaScript
PHP的垃圾回收機制
2021-01-14
PHP
PHP的垃圾回收機制-回收週期
2021-01-14
PHP
[深入理解Java虛擬機器]垃圾回收演算法
2024-07-18
Java虛擬機演算法
JAVA垃圾回收機制和Python垃圾回收對比與分析
2018-03-08
JavaPython
JS垃圾回收機制筆記
2018-10-19
JS筆記
[效能][JVM]jvm垃圾回收機制
2019-02-21
JVM
V8垃圾回收機制
2019-03-15
JVM垃圾回收機制入門
2018-07-14
JVM
談談 JVM 垃圾回收機制
2024-05-15
JVM
【翻譯】PHP 垃圾回收機制
2019-07-12
PHP
Flutter中的垃圾回收機制
2019-03-21
Flutter
圖解Golang垃圾回收機制！
2021-07-14
圖解Golang
[譯] 通過垃圾回收機制理解 JavaScript 記憶體管理
2019-01-20
JavaScript記憶體
深入理解JVM虛擬機器3：垃圾回收器詳解
2019-11-13
JVM虛擬機
[深入理解Java虛擬機器]Hotspot垃圾回收演算法
2024-07-18
Java虛擬機HotSpot演算法
C#垃圾回收機制詳解
2018-06-28
C#
聊聊JVM的垃圾回收機制GC
2018-06-25
JVMGC