Golang Map實現（四） map 的賦值和擴容

搬磚程式設計師帶你飛發表於2020-04-30

原文網址 : https://www.cnblogs.com/-lee/p/12807063.html

title: Golang Map 實現（四）
date: 2020-04-28 18:20:30
tags:

golang map 操作，是map 實現中較複雜的邏輯。因為當賦值時，為了減少hash 衝突鏈的長度過長問題，會做map 的擴容以及資料的遷移。而map 的擴容以及資料的遷移也是關注的重點。

資料結構

首先，我們需要重新學習下map實現的資料結構：

type hmap struct {
  count     int
  flags     uint8  
  B         uint8
  noverflow uint16
  hash0     uint32
  buckets    unsafe.Pointer
  oldbuckets unsafe.Pointer
  nevacuate  uintptr
  extra *mapextra
}

type mapextra struct {
  overflow    *[]*bmap
  oldoverflow *[]*bmap
  nextOverflow *bmap
}

hmap 是 map 實現的結構體。大部分欄位在第一節中已經學習過了。剩餘的就是nevacuate 和extra 了。

首先需要了解搬遷的概念：當hash 中資料鏈太長，或者空的bucket 太多時，會運算元據搬遷，將資料挪到一個新的bucket 上，就的bucket陣列成為了oldbuckets。bucket的搬遷不是一次就搬完的，是訪問到對應的bucket時才可能會觸發搬遷操作。（這一點是不是和redis 的擴容比較類似，將擴容放在多個訪問上，減少了單次訪問的延遲壓力）

nevactuate 標識的是搬遷的位置(也可以考慮為搬遷的進度）。標識目前 oldbuckets 中（一個 array）bucket 搬遷到哪裡了。
extra 是一個map 的結構體，nextOverflow 標識的是申請的空的bucket，用於之後解決衝突時使用；overflow 和 oldoverflow 標識溢位的連結串列中正在使用的bucket 資料。old 和非old 的區別是，old 是為搬遷的資料。

理解了大概的資料結構，我們可以學習map的賦值操作了。

map 賦值操作

map 的賦值操作寫法如下：


   data := mapExample["hello"]

賦值的實現，golang 為了對不同型別k做了優化，下面時一些實現方法：

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer {}
func mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {}
func mapassign_fast64ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer{}
func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer {}

內容大同小異，我們主要學習mapassign 的實現。

mapassign 方法的實現是查詢一個空的bucket，把key賦值到bucket上，然後把val的地址返回,然後直接通過彙編做記憶體拷貝。
那我們一步步看是如何找空閒bucket的：

① 在查詢key之前，會做異常檢測，校驗map是否未初始化，或正在併發寫操作，如果存在，則丟擲異常：（這就是為什麼map 併發寫回panic的原因）

if h == nil {
  panic(plainError("assignment to entry in nil map"))
}
// 竟態檢查 和 記憶體掃描

if h.flags&hashWriting != 0 {
  throw("concurrent map writes")
}

② 需要計算key 對應的hash 值，如果buckets 為空（初始化的時候小於一定長度的map 不會初始化資料）還需要初始化一個bucket

alg := t.key.alg
hash := alg.hash(key, uintptr(h.hash0))

// 為什麼需要在hash 後設定flags，因為 alg.hash可能會panic
h.flags ^= hashWriting

if h.buckets == nil {
  h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}

③ 通過hash 值，獲取對應的bucket。如果map 還在遷移資料，還需要在oldbuckets中找對應的bucket，並搬遷到新的bucket。


// 通過hash 計算bucket的位置偏移
bucket := hash & bucketMask(h.B)

// 此處是搬遷邏輯，我們後續詳解
if h.growing() {
  growWork(t, h, bucket)
}

// 計算對應的bucket 位置，和top hash 值
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
top := tophash(hash)

④ 拿到bucket之後，還需要按照連結串列方式一個一個查，找到對應的key，可能是已經存在的key，也可能需要新增。

for {
  for i := uintptr(0); i < bucketCnt; i++ {

    // 若 tophash 就不相等，那就取tophash 中的下一個
    if b.tophash[i] != top {

      // 若是個空位置，把kv的指標拿到。
      if isEmpty(b.tophash[i]) && inserti == nil {
        inserti = &b.tophash[i]
        insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
        val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
      }

      // 若後續無資料，那就不用再找坑了
      if b.tophash[i] == emptyRest {
        break bucketloop
      }
      continue
    }

    // 若tophash匹配時

    k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
    if t.indirectkey() {
      k = *((*unsafe.Pointer)(k))
    }

    // 比較k不等，還需要繼續找
    if !alg.equal(key, k) {
      continue
    }

    // 如果key 也相等，說明之前有資料，直接更新k，並拿到v的地址就可以了
    if t.needkeyupdate() {
      typedmemmove(t.key, k, key)
    }
    val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
    goto done
  }
  // 取下一個overflow （連結串列指標）
  ovf := b.overflow(t)
  if ovf == nil {
    break
  }
  b = ovf
}

總結下這段程式，主要有幾個部分：

a. map hash 不匹配的情況，會看是否是空kv 。如果呼叫了delete，會出現空kv的情況，那先把地址留下，如果後面也沒找到對應的k（也就是說之前map 裡面沒有對應的Key），那就直接用空kv的位置即可。
b. 如果 map hash 是匹配的，需要判定key 的字面值是否匹配。如果不匹配，還需要查詢。如果匹配了，那直接把key 更新（因為可能有引用），v的地址返回即可。
c. 如果上面都沒有，那就看下一個bucket

⑤ 插入資料前，會先檢查資料太多了，需要擴容，如果需要擴容，那就從第③開始拿到新的bucket，並查詢對應的位置。

if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
  hashGrow(t, h)
  goto again // Growing the table invalidates everything, so try again
}

⑥ 如果剛才看沒有有空的位置，那就需要在連結串列後追加一個bucket，拿到kv。

if inserti == nil {
  // all current buckets are full, allocate a new one.
  newb := h.newoverflow(t, b)
  inserti = &newb.tophash[0]
  insertk = add(unsafe.Pointer(newb), dataOffset)
  val = add(insertk, bucketCnt*uintptr(t.keysize))
}

⑦ 最後更新tophash 和 key 的字面值, 並解除hashWriting 約束

// 如果非指標資料（也就是直接賦值的資料），還需要申請記憶體和拷貝
if t.indirectkey() {
  kmem := newobject(t.key)
  *(*unsafe.Pointer)(insertk) = kmem
  insertk = kmem
}
if t.indirectvalue() {
  vmem := newobject(t.elem)
  *(*unsafe.Pointer)(val) = vmem
}
// 更新tophash, k
typedmemmove(t.key, insertk, key)
*inserti = top

done:
if h.flags&hashWriting == 0 {
    throw("concurrent map writes")
  }
  h.flags &^= hashWriting
  if t.indirectvalue() {
    val = *((*unsafe.Pointer)(val))
  }
  return val

到這裡，map的賦值基本就介紹完了。下面學習下步驟⑤中的map的擴容。

Map 的擴容

有兩種情況下，需要做擴容。一種是存的kv資料太多了，已經超過了當前map的負載。還有一種是overflow的bucket過多了。這個閾值是一個定值，經驗得出的結論，所以我們這裡不考究。

當滿足條件後，將開始擴容。如果滿足條件二，擴容後的buckets 的數量和原來是一樣的，說明可能是空kv佔據的坑太多了，通過map擴容做記憶體整理。如果是因為kv 量多導致map負載過高，那就擴一倍的量。

func hashGrow(t *maptype, h *hmap) {
  bigger := uint8(1)
  // 如果是第二種情況，擴容大小為0
  if !overLoadFactor(h.count+1, h.B) {
    bigger = 0
    h.flags |= sameSizeGrow
  }
  oldbuckets := h.buckets

  // 申請一個大陣列，作為新的buckets
  newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

  flags := h.flags &^ (iterator | oldIterator)
  if h.flags&iterator != 0 {
    flags |= oldIterator
  }
  
  // 然後重新賦值map的結構體，oldbuckets 被填充。之後將做搬遷操作
  h.B += bigger
  h.flags = flags
  h.oldbuckets = oldbuckets
  h.buckets = newbuckets
  h.nevacuate = 0
  h.noverflow = 0

  // extra 結構體做賦值
  if h.extra != nil && h.extra.overflow != nil {
    // Promote current overflow buckets to the old generation.
    if h.extra.oldoverflow != nil {
      throw("oldoverflow is not nil")
    }
    h.extra.oldoverflow = h.extra.overflow
    h.extra.overflow = nil
  }
  if nextOverflow != nil {
    if h.extra == nil {
      h.extra = new(mapextra)
    }
    h.extra.nextOverflow = nextOverflow
  }
}

總結下map的擴容操作。首先拿到擴容的大小，然後申請大陣列，然後做些初始化的操作，把老的buckets，以及overflow做切換即可。

map 資料的遷移

擴容完成後，需要做資料的遷移。資料的遷移不是一次完成的，是使用時才會做對應bucket的遷移。也就是逐步做到的資料遷移。下面我們來學習。

在資料賦值的第③步，會看需要操作的bucket是不是在舊的buckets裡面，如果在就搬遷。下面是搬遷的具體操作：

func growWork(t *maptype, h *hmap, bucket uintptr) {
  // 首先把需要操作的bucket 搬遷
  evacuate(t, h, bucket&h.oldbucketmask())
  
  // 再順帶搬遷一個bucket
  if h.growing() {
    evacuate(t, h, h.nevacuate)
  }
}

nevacuate 標識的是當前的進度，如果都搬遷完，應該和2^B的長度是一樣的（這裡說的B是oldbuckets 裡面的B，畢竟新的buckets長度可能是2^(B+1))。

在evacuate 方法實現是把這個位置對應的bucket，以及其衝突鏈上的資料都轉移到新的buckets上。

① 先要判斷當前bucket是不是已經轉移。 (oldbucket 標識需要搬遷的bucket 對應的位置)

b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
// 判斷
if !evacuated(b) {
  // 做轉移操作
}

轉移的判斷直接通過tophash 就可以，判斷tophash中第一個hash值即可（tophash的作用可以參考第三講）

func evacuated(b *bmap) bool {
  h := b.tophash[0]
  // 這個區間的flag 均是已被轉移
  return h > emptyOne && h < minTopHash
}

② 如果沒有被轉移，那就要遷移資料了。資料遷移時，可能是遷移到大小相同的buckets上，也可能遷移到2倍大的buckets上。這裡xy 都是標記目標遷移位置的標記：x 標識的是遷移到相同的位置，y 標識的是遷移到2倍大的位置上。我們先看下目標位置的確定：

var xy [2]evacDst
x := &xy[0]
x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
x.k = add(unsafe.Pointer(x.b), dataOffset)
x.v = add(x.k, bucketCnt*uintptr(t.keysize))
if !h.sameSizeGrow() {
  // 如果是2倍的大小，就得算一次 y 的值
  y := &xy[1]
  y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
  y.k = add(unsafe.Pointer(y.b), dataOffset)
  y.v = add(y.k, bucketCnt*uintptr(t.keysize))
}

③ 確定bucket位置後，需要按照kv 一條一條做遷移。（目的就是清除空閒的kv）


// 遍歷每個bucket
for ; b != nil; b = b.overflow(t) {
  k := add(unsafe.Pointer(b), dataOffset)
  v := add(k, bucketCnt*uintptr(t.keysize))

  // 遍歷bucket 裡面的每個kv
  for i := 0; i < bucketCnt; i, k, v = i+1, add(k, uintptr(t.keysize)), add(v, uintptr(t.valuesize)) {
    top := b.tophash[i]

    // 空的不做遷移
    if isEmpty(top) {
      b.tophash[i] = evacuatedEmpty
      continue
    }
    if top < minTopHash {
      throw("bad map state")
    }
    k2 := k
    if t.indirectkey() {
      k2 = *((*unsafe.Pointer)(k2))
    }
    var useY uint8
    if !h.sameSizeGrow() {
      // 2倍擴容的需要重新計算hash，
      hash := t.key.alg.hash(k2, uintptr(h.hash0))
      if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.alg.equal(k2, k2) {
        useY = top & 1
        top = tophash(hash)
      } else {
        if hash&newbit != 0 {
          useY = 1
        }
      }
    }

    // 這些是固定值的校驗，可以忽略
    if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
      throw("bad evacuatedN")
    }

    // 設定oldbucket 的tophash 為已搬遷
    b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
    dst := &xy[useY]                 // evacuation destination
    if dst.i == bucketCnt {
      // 如果dst是bucket 裡面的最後一個kv，則需要新增一個overflow
      dst.b = h.newoverflow(t, dst.b)
      dst.i = 0
      dst.k = add(unsafe.Pointer(dst.b), dataOffset)
      dst.v = add(dst.k, bucketCnt*uintptr(t.keysize))
    }
    // 填充tophash值， kv 資料
    dst.b.tophash[dst.i&(bucketCnt-1)] = top
    if t.indirectkey() {
      *(*unsafe.Pointer)(dst.k) = k2
    } else {
      typedmemmove(t.key, dst.k, k)
    }
    if t.indirectvalue() {
      *(*unsafe.Pointer)(dst.v) = *(*unsafe.Pointer)(v)
    } else {
      typedmemmove(t.elem, dst.v, v)
    }

    // 更新目標的bucket
    dst.i++
    dst.k = add(dst.k, uintptr(t.keysize))
    dst.v = add(dst.v, uintptr(t.valuesize))
  }
}

對於key 非間接使用的資料（即非指標資料），做記憶體回收

if h.flags&oldIterator == 0 && t.bucket.kind&kindNoPointers == 0 {
  b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
  ptr := add(b, dataOffset)
  n := uintptr(t.bucketsize) - dataOffset

  // ptr 是kv的位置， 前面的topmap 保留，做遷移前的校驗使用
  memclrHasPointers(ptr, n)
}

④ 如果當前搬遷的bucket 和總體搬遷的bucket的位置是一樣的，我們需要更新總體進度的標記 nevacuate

// newbit 是oldbuckets 的長度，也是nevacuate 的重點
func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
  // 首先更新標記
  h.nevacuate++

  // 最多檢視2^10 個bucket
  stop := h.nevacuate + 1024
  if stop > newbit {
    stop = newbit
  }

  // 如果沒有搬遷就停止了，等下次搬遷
  for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {
    h.nevacuate++
  }

  // 如果都已經搬遷完了，oldbukets 完全搬遷成功，清空oldbuckets
  if h.nevacuate == newbit {
    h.oldbuckets = nil
    if h.extra != nil {
      h.extra.oldoverflow = nil
    }
    h.flags &^= sameSizeGrow
  }
}

總結

Map 的賦值難點在於資料的擴容和資料的搬遷操作。
bucket 搬遷是逐步進行的，每進行一次賦值，會做至少一次搬遷工作。
擴容不是一定會新增空間，也有可能是隻是做了記憶體整理。
tophash 的標誌即可以判斷是否為空，還會判斷是否搬遷，以及搬遷的位置為X or Y。
delete map 中的key，有可能出現很多空的kv，會導致搬遷操作。如果可以避免，儘量避免。

【GoLang 那點事】深入 Go 的 Map 使用和實現原理
2019-10-12
Golang
Golang map執行緒安全實現及sync.map使用及原理解析。
2020-12-25
Golang執行緒
深入 Go 的 Map 使用和實現原理
2019-10-12
Go
golang 多維map
2021-03-12
Golang
java8 實現map以value值排序
2018-09-13
Java排序
Golang 陣列和切片 Slice 和 Map 使用
2020-06-04
Golang陣列
MyBatis學習筆記（四）使用map實現查詢和插入
2020-10-25
MyBatis筆記
Golang中map的三種宣告方式和簡單實現增刪改查
2018-09-13
Golang
Java交換map的key和value值
2024-05-31
Java
Golang 引用型別-map
2020-10-17
Golang型別
徹底理解Golang Map
2022-01-24
Golang
golang map的判斷，刪除
2018-11-27
Golang
golang map的底層結構
2024-11-28
Golang
解碼 xsync 的 map 實現
2024-07-17
GO 中 map 的實現原理
2021-06-19
Go
golang學習之路之map
2018-09-05
Golang
[Golang併發]Sync.map
2024-06-23
Golang
用whistle實現map local
2018-05-01
javascript實現Map結構
2021-09-09
JavaScript
map的四種遍歷方式
2018-05-05
Set 和Map
2019-02-21
map和multimap
2023-05-08
[譯] part 13: golang 對映 map
2019-04-14
Golang
深入理解golang：sync.map
2020-07-23
Golang
Java中實現不可變Map
2018-12-08
Java
c++ map和unordered_map比較
2018-06-03
C++
Map類及其主要的實現類
2018-10-13
Go語言map的底層實現
2018-09-29
Go
在幕後看看Swift中的Map，Filter和Reduce的實現
2019-02-21
SwiftFilter
Java集合四：Map簡介；
2020-12-05
Java
Map集合的四種遍歷方式
2018-04-03
一個 key 能儲存多個 value 的 map --- 自定義的 MultiValueMap，實現 Map 介面
2019-03-09
mitmproxy 代理工具介紹：rewrite和map local實現
2021-02-14
MIT
golang中struct、json、map互相轉化
2018-11-18
GolangStructJSON
map、unordered_map、set 和 unordered_set的小介紹
2024-11-18
go map 和 slice
2018-08-03
Go
python map和lambda
2020-10-08
Python
理解 Golang 的 map 資料結構設計
2019-09-12
Golang資料結構