以太坊原始碼分析(52）trie原始碼分析

尹成發表於2018-05-14
包trie 實現了Merkle Patricia Tries，這裡用簡稱MPT來稱呼這種資料結構，這種資料結構實際上是一種Trie樹變種，MPT是以太坊中一種非常重要的資料結構，用來儲存使用者賬戶的狀態以及狀態的變更，用來儲存交易資訊，用來儲存交易的收據資訊。MPT實際上是三種資料結構的組合，分別是Trie樹， Patricia Trie， 和Merkle樹。下面分別介紹這三種資料結構。

## Trie樹 (引用介紹 http://dongxicheng.org/structure/trietree/)
Trie樹，又稱字典樹，單詞查詢樹或者字首樹，是一種用於快速檢索的多叉樹結構，如英文字母的字典樹是一個26叉樹，數字的字典樹是一個10叉樹。

Trie樹可以利用字串的公共字首來節約儲存空間。如下圖所示，該trie樹用10個節點儲存了6個字串tea，ten，to，in，inn，int：


在該trie樹中，字串in，inn和int的公共字首是“in”，因此可以只儲存一份“in”以節省空間。當然，如果系統中存在大量字串且這些字串基本沒有公共字首，則相應的trie樹將非常消耗記憶體，這也是trie樹的一個缺點。

Trie樹的基本性質可以歸納為：

- 根節點不包含字元，除根節點意外每個節點只包含一個字元。
- 從根節點到某一個節點，路徑上經過的字元連線起來，為該節點對應的字串。
- 每個節點的所有子節點包含的字串不相同。

## Patricia Tries (字首樹)
字首樹根Trie樹的不同之處在於Trie樹給每一個字串分配一個節點，這樣如果很多很長的，又沒有公共節點的字串就會導致Trie樹退化成一個陣列。在以太坊裡面會由黑客構造很多這種節點造成拒絕服務攻擊。字首樹的不同之處在於如果節點公共字首，那麼就使用公共字首，否則就把剩下的所有節點插入同一個節點。Patricia相對Tire的優化正如下圖：


上圖儲存的8個Key Value對，可以看到字首樹的特點。

|Key           | value |
| ------------- | ---: |
|6c0a5c71ec20bq3w|5     |
|6c0a5c71ec20CX7j|27    |
|6c0a5c71781a1FXq|18    |
|6c0a5c71781a9Dog|64    |
|6c0a8f743b95zUfe|30    |
|6c0a8f743b95jx5R|2     |
|6c0a8f740d16y03G|43    |
|6c0a8f740d16vcc1|48    |

## Merkle樹 (參考 http://blog.csdn.net/wo541075754/article/details/54632929）
Merkle Tree，通常也被稱作Hash Tree，顧名思義，就是儲存hash值的一棵樹。Merkle樹的葉子是資料塊(例如，檔案或者檔案的集合)的hash值。非葉節點是其對應子節點串聯字串的hash。

![image](picture/trie_3.png)

Merkle Tree的主要作用是當我拿到Top Hash的時候，這個hash值代表了整顆樹的資訊摘要，當樹裡面任何一個資料發生了變動，都會導致Top Hash的值發生變化。 而Top Hash的值是會儲存到區塊鏈的區塊頭裡面去的， 區塊頭是必須經過工作量證明。 這也就是說我只要拿到一個區塊頭，就可以對區塊資訊進行驗證。 更加詳細的資訊請參考那個部落格。有詳細的介紹。


## 以太坊的MPT
每一個以太坊的區塊頭包含三顆MPT樹，分別是

- 交易樹
- 收據樹(交易執行過程中的一些資料)
- 狀態樹(賬號資訊， 合約賬戶和使用者賬戶)

下圖中是兩個區塊頭，其中state root，tx root receipt root分別儲存了這三棵樹的樹根，第二個區塊顯示了當賬號 175的資料變更(27 -> 45)的時候，只需要儲存跟這個賬號相關的部分資料，而且老的區塊中的資料還是可以正常訪問。(這個有點類似與函數語言程式設計語言中的不可變的資料結構的實現)
![image](picture/trie_4.png)
詳細結構為
![world state trie](picture/worldstatetrie.png)

## 黃皮書形式化定義(Appendix D. Modified Merkle Patricia Tree)

正式地，我們假設輸入值J，包含Key Value對的集合（Key Value都是位元組陣列）：


當處理這樣一個集合的時候，我們使用下面的這樣標識表示資料的 Key和Value(對於J集合中的任意一個I， I0表示Key， I1表示Value)


對於任何特定的位元組，我們可以表示為對應的半位元組（nibble），其中Y集合在Hex-Prefix Encoding中有說明，意為半位元組（4bit）集合（之所以採用半位元組，其與後續說明的分支節點branch node結構以及key中編碼flag有關）



我們定義了TRIE函式，用來表示樹根的HASH值（其中c函式的第二個引數，意為構建完成後樹的層數。root的值為0）



我們還定義一個函式n，這個trie的節點函式。 當組成節點時，我們使用RLP對結構進行編碼。 作為降低儲存複雜度的手段，對於RLP少於32位元組的節點，我們直接儲存其RLP值， 對於那些較大的，我們儲存其HASH節點。
我們用c來定義節點組成函式：



以類似於基數樹的方式，當Trie樹從根遍歷到葉時，可以構建單個鍵值對。 Key通過遍歷累積，從每個分支節點獲取單個半位元組（與基數樹一樣）。 與基數樹不同，在共享相同字首的多個Key的情況下，或者在具有唯一字尾的單個Key的情況下，提供兩個優化節點。的情況下，或者在具有唯一字尾的單個金鑰的情況下，提供兩個優化節點。 因此，當遍歷時，可能從其他兩個節點型別，擴充套件和葉中的每一個潛在地獲取多個半位元組。在Trie樹中有三種節點：

- **葉子節點(Leaf):** 葉子節點包含兩個欄位， 第一個欄位是剩下的Key的半位元組編碼,而且半位元組編碼方法的第二個引數為true， 第二個欄位是Value
- **擴充套件節點(Extention):** 擴充套件節點也包含兩個欄位， 第一個欄位是剩下的Key的可以至少被兩個剩下節點共享的部分的半位元組編碼，第二個欄位是n(J,j)
- **分支節點(Branch):** 分支節點包含了17個欄位，其前16個專案對應於這些點在其遍歷中的鍵的十六個可能的半位元組值中的每一個。第17個欄位是儲存那些在當前結點結束了的節點(例如， 有三個key,分別是 (abc ,abd, ab) 第17個欄位儲存了ab節點的值)

分支節點只有在需要的時候使用， 對於一個只有一個非空 key value對的Trie樹，可能不存在分支節點。 如果使用公式來定義這三種節點， 那麼公式如下：
圖中的HP函式代表Hex-Prefix Encoding，是一種半位元組編碼格式，RLP是使用RLP進行序列化的函式。

![image](picture/trie_10.png)

對於上圖的三種情況的解釋

- 如果當前需要編碼的KV集合只剩下一條資料，那麼這條資料按照第一條規則進行編碼。
- 如果當前需要編碼的KV集合有公共字首，那麼提取最大公共字首並使用第二條規則進行處理。
- 如果不是上面兩種情況，那麼使用分支節點進行集合切分，因為key是使用HP進行編碼的，所以可能的分支只有0-15這16個分支。可以看到u的值由n進行遞迴定義，而如果有節點剛好在這裡完結了，那麼第17個元素v就是為這種情況準備的。

對於資料應該如何儲存和不應該如何儲存， 黃皮書中說明沒有顯示的定義。所以這是一個實現上的問題。我們簡單的定義了一個函式來把J對映為一個Hash。 我們認為對於任意一個J，只存在唯一一個Hash值。

### 黃皮書的形式化定義(Hex-Prefix Encoding)--十六進位制字首編碼
十六進位制字首編碼是將任意數量的半位元組編碼為位元組陣列的有效方法。它能夠儲存附加標誌，當在Trie樹中使用時(唯一會使用的地方)，會在節點型別之間消除歧義。

它被定義為從一系列半位元組（由集合Y表示）與布林值一起對映到位元組序列（由集合B表示）的函式HP：



因此，第一個位元組的高半位元組包含兩個標誌; 最低bit位編碼了長度的奇偶位，第二低的bit位編碼了flag的值。 在偶數個半位元組的情況下，第一個位元組的低半位元組為零，在奇數的情況下為第一個半位元組。 所有剩餘的半位元組（現在是偶數）適合其餘的位元組。

## 原始碼實現
### trie/encoding.go
encoding.go主要處理trie樹中的三種編碼格式的相互轉換的工作。 三種編碼格式分別為下面的三種編碼格式。

- **KEYBYTES encoding**這種編碼格式就是原生的key位元組陣列，大部分的Trie的API都是使用這邊編碼格式
- **HEX encoding** 這種編碼格式每一個位元組包含了Key的一個半位元組，尾部接上一個可選的'終結符','終結符'代表這個節點到底是葉子節點還是擴充套件節點。當節點被載入到記憶體裡面的時候使用的是這種節點，因為它的方便訪問。
- **COMPACT encoding** 這種編碼格式就是上面黃皮書裡面說到的Hex-Prefix Encoding，這種編碼格式可以看成是*HEX encoding**這種編碼格式的另外一種版本，可以在儲存到資料庫的時候節約磁碟空間。

簡單的理解為：將普通的位元組序列keybytes編碼為帶有t標誌與奇數個半位元組nibble標誌位的keybytes
- keybytes為按完整位元組（8bit）儲存的正常資訊
- hex為按照半位元組nibble（4bit）儲存資訊的格式。供compact使用
- 為了便於作黃皮書中Modified Merkle Patricia Tree的節點的key，編碼為偶數字節長度的hex格式。其第一個半位元組nibble會在低的2個bit位中，由高到低分別存放t標誌與奇數標誌。經compact編碼的keybytes，在增加了hex的t標誌與半位元組nibble為偶數個（即完整的位元組）的情況下，便於儲存

程式碼實現，主要是實現了這三種編碼的相互轉換，以及一個求取公共字首的方法。

    func hexToCompact(hex []byte) []byte {
        terminator := byte(0)
        if hasTerm(hex) {
            terminator = 1
            hex = hex[:len(hex)-1]
        }
        buf := make([]byte, len(hex)/2+1)
        buf[0] = terminator << 5 // the flag byte
        if len(hex)&1 == 1 {
            buf[0] |= 1 << 4 // odd flag
            buf[0] |= hex[0] // first nibble is contained in the first byte
            hex = hex[1:]
        }
        decodeNibbles(hex, buf[1:])
        return buf
    }
    
    func compactToHex(compact []byte) []byte {
        base := keybytesToHex(compact)
        base = base[:len(base)-1]
        // apply terminator flag
        if base[0] >= 2 { // TODO 先將keybytesToHex輸出的末尾結束標誌刪除後，再通過判斷頭半個位元組的標誌位t加回去。操作冗餘
            base = append(base, 16)
        }
        // apply odd flag
        chop := 2 - base[0]&1
        return base[chop:]
    }
    
    func keybytesToHex(str []byte) []byte {
        l := len(str)*2 + 1
        var nibbles = make([]byte, l)
        for i, b := range str {
            nibbles[i*2] = b / 16
            nibbles[i*2+1] = b % 16
        }
        nibbles[l-1] = 16
        return nibbles
    }
    
    // hexToKeybytes turns hex nibbles into key bytes.
    // This can only be used for keys of even length.
    func hexToKeybytes(hex []byte) []byte {
        if hasTerm(hex) {
            hex = hex[:len(hex)-1]
        }
        if len(hex)&1 != 0 {
            panic("can't convert hex key of odd length")
        }
        key := make([]byte, (len(hex)+1)/2) // TODO 對於一個已經判斷為偶數的len(hex)在整除2的同時加1，為無效的+1邏輯
        decodeNibbles(hex, key)
        return key
    }
    
    func decodeNibbles(nibbles []byte, bytes []byte) {
        for bi, ni := 0, 0; ni < len(nibbles); bi, ni = bi+1, ni+2 {
            bytes[bi] = nibbles[ni]<<4 | nibbles[ni+1]
        }
    }
    
    // prefixLen returns the length of the common prefix of a and b.
    func prefixLen(a, b []byte) int {
        var i, length = 0, len(a)
        if len(b) < length {
            length = len(b)
        }
        for ; i < length; i++ {
            if a[i] != b[i] {
                break
            }
        }
        return i
    }
    
    // hasTerm returns whether a hex key has the terminator flag.
    func hasTerm(s []byte) bool {
        return len(s) > 0 && s[len(s)-1] == 16
    }

### 資料結構
node的結構，可以看到node分為4種型別， fullNode對應了黃皮書裡面的分支節點，shortNode對應了黃皮書裡面的擴充套件節點和葉子節點(通過shortNode.Val的型別來對應到底是葉子節點(valueNode)還是分支節點(fullNode))

    type node interface {
        fstring(string) string
        cache() (hashNode, bool)
        canUnload(cachegen, cachelimit uint16) bool
    }
    
    type (
        fullNode struct {
            Children [17]node // Actual trie node data to encode/decode (needs custom encoder)
            flags    nodeFlag
        }
        shortNode struct {
            Key   []byte
            Val   node
            flags nodeFlag
        }
        hashNode  []byte
        valueNode []byte
    )

trie的結構， root包含了當前的root節點， db是後端的KV儲存，trie的結構最終都是需要通過KV的形式儲存到資料庫裡面去，然後啟動的時候是需要從資料庫裡面載入的。 originalRoot 啟動載入的時候的hash值，通過這個hash值可以在資料庫裡面恢復出整顆的trie樹。cachegen欄位指示了當前Trie樹的cache時代，每次呼叫Commit操作的時候，會增加Trie樹的cache時代。 cache時代會被附加在node節點上面，如果當前的cache時代 - cachelimit引數 大於node的cache時代，那麼node會從cache裡面解除安裝，以便節約記憶體。 其實這就是快取更新的LRU演算法， 如果一個快取在多久沒有被使用，那麼就從快取裡面移除，以節約記憶體空間。

    // Trie is a Merkle Patricia Trie.
    // The zero value is an empty trie with no database.
    // Use New to create a trie that sits on top of a database.
    //
    // Trie is not safe for concurrent use.
    type Trie struct {
        root         node
        db           Database
        originalRoot common.Hash
    
        // Cache generation values.
        // cachegen increases by one with each commit operation.
        // new nodes are tagged with the current generation and unloaded
        // when their generation is older than than cachegen-cachelimit.
        cachegen, cachelimit uint16
    }


###Trie樹的插入，查詢和刪除
Trie樹的初始化呼叫New函式，函式接受一個hash值和一個Database引數，如果hash值不是空值的化，就說明是從資料庫載入一個已經存在的Trie樹， 就呼叫trei.resolveHash方法來載入整顆Trie樹，這個方法後續會介紹。 如果root是空，那麼就新建一顆Trie樹返回。

    func New(root common.Hash, db Database) (*Trie, error) {
        trie := &Trie{db: db, originalRoot: root}
        if (root != common.Hash{}) && root != emptyRoot {
            if db == nil {
                panic("trie.New: cannot use existing root without a database")
            }
            rootnode, err := trie.resolveHash(root[:], nil)
            if err != nil {
                return nil, err
            }
            trie.root = rootnode
        }
        return trie, nil
    }

Trie樹的插入，這是一個遞迴呼叫的方法，從根節點開始，一直往下找，直到找到可以插入的點，進行插入操作。引數node是當前插入的節點， prefix是當前已經處理完的部分key， key是還沒有處理玩的部分key,  完整的key = prefix + key。 value是需要插入的值。 返回值bool是操作是否改變了Trie樹(dirty)，node是插入完成後的子樹的根節點， error是錯誤資訊。

- 如果節點型別是nil(一顆全新的Trie樹的節點就是nil的),這個時候整顆樹是空的，直接返回shortNode{key, value, t.newFlag()}， 這個時候整顆樹的跟就含有了一個shortNode節點。 
- 如果當前的根節點型別是shortNode(也就是葉子節點)，首先計算公共字首，如果公共字首就等於key，那麼說明這兩個key是一樣的，如果value也一樣的(dirty == false)，那麼返回錯誤。 如果沒有錯誤就更新shortNode的值然後返回。如果公共字首不完全匹配，那麼就需要把公共字首提取出來形成一個獨立的節點(擴充套件節點),擴充套件節點後面連線一個branch節點，branch節點後面看情況連線兩個short節點。首先構建一個branch節點(branch := &fullNode{flags: t.newFlag()}),然後再branch節點的Children位置呼叫t.insert插入剩下的兩個short節點。這裡有個小細節，key的編碼是HEX encoding,而且末尾帶了一個終結符。考慮我們的根節點的key是abc0x16，我們插入的節點的key是ab0x16。下面的branch.Children[key[matchlen]]才可以正常執行，0x16剛好指向了branch節點的第17個孩子。如果匹配的長度是0，那麼直接返回這個branch節點，否則返回shortNode節點作為字首節點。
- 如果當前的節點是fullNode(也就是branch節點)，那麼直接往對應的孩子節點呼叫insert方法,然後把對應的孩子節點只想新生成的節點。
- 如果當前節點是hashNode, hashNode的意思是當前節點還沒有載入到記憶體裡面來，還是存放在資料庫裡面，那麼首先呼叫 t.resolveHash(n, prefix)來載入到記憶體，然後對載入出來的節點呼叫insert方法來進行插入。


插入程式碼

    func (t *Trie) insert(n node, prefix, key []byte, value node) (bool, node, error) {
        if len(key) == 0 {
            if v, ok := n.(valueNode); ok {
                return !bytes.Equal(v, value.(valueNode)), value, nil
            }
            return true, value, nil
        }
        switch n := n.(type) {
        case *shortNode:
            matchlen := prefixLen(key, n.Key)
            // If the whole key matches, keep this short node as is
            // and only update the value.
            if matchlen == len(n.Key) {
                dirty, nn, err := t.insert(n.Val, append(prefix, key[:matchlen]...), key[matchlen:], value)
                if !dirty || err != nil {
                    return false, n, err
                }
                return true, &shortNode{n.Key, nn, t.newFlag()}, nil
            }
            // Otherwise branch out at the index where they differ.
            branch := &fullNode{flags: t.newFlag()}
            var err error
            _, branch.Children[n.Key[matchlen]], err = t.insert(nil, append(prefix, n.Key[:matchlen+1]...), n.Key[matchlen+1:], n.Val)
            if err != nil {
                return false, nil, err
            }
            _, branch.Children[key[matchlen]], err = t.insert(nil, append(prefix, key[:matchlen+1]...), key[matchlen+1:], value)
            if err != nil {
                return false, nil, err
            }
            // Replace this shortNode with the branch if it occurs at index 0.
            if matchlen == 0 {
                return true, branch, nil
            }
            // Otherwise, replace it with a short node leading up to the branch.
            return true, &shortNode{key[:matchlen], branch, t.newFlag()}, nil
    
        case *fullNode:
            dirty, nn, err := t.insert(n.Children[key[0]], append(prefix, key[0]), key[1:], value)
            if !dirty || err != nil {
                return false, n, err
            }
            n = n.copy()
            n.flags = t.newFlag()
            n.Children[key[0]] = nn
            return true, n, nil
    
        case nil:
            return true, &shortNode{key, value, t.newFlag()}, nil
    
        case hashNode:
            // We've hit a part of the trie that isn't loaded yet. Load
            // the node and insert into it. This leaves all child nodes on
            // the path to the value in the trie.
            rn, err := t.resolveHash(n, prefix)
            if err != nil {
                return false, nil, err
            }
            dirty, nn, err := t.insert(rn, prefix, key, value)
            if !dirty || err != nil {
                return false, rn, err
            }
            return true, nn, nil
    
        default:
            panic(fmt.Sprintf("%T: invalid node: %v", n, n))
        }
    }


Trie樹的Get方法，基本上就是很簡單的遍歷Trie樹，來獲取Key的資訊。


    func (t *Trie) tryGet(origNode node, key []byte, pos int) (value []byte, newnode node, didResolve bool, err error) {
        switch n := (origNode).(type) {
        case nil:
            return nil, nil, false, nil
        case valueNode:
            return n, n, false, nil
        case *shortNode:
            if len(key)-pos < len(n.Key) || !bytes.Equal(n.Key, key[pos:pos+len(n.Key)]) {
                // key not found in trie
                return nil, n, false, nil
            }
            value, newnode, didResolve, err = t.tryGet(n.Val, key, pos+len(n.Key))
            if err == nil && didResolve {
                n = n.copy()
                n.Val = newnode
                n.flags.gen = t.cachegen
            }
            return value, n, didResolve, err
        case *fullNode:
            value, newnode, didResolve, err = t.tryGet(n.Children[key[pos]], key, pos+1)
            if err == nil && didResolve {
                n = n.copy()
                n.flags.gen = t.cachegen
                n.Children[key[pos]] = newnode
            }
            return value, n, didResolve, err
        case hashNode:
            child, err := t.resolveHash(n, key[:pos])
            if err != nil {
                return nil, n, true, err
            }
            value, newnode, _, err := t.tryGet(child, key, pos)
            return value, newnode, true, err
        default:
            panic(fmt.Sprintf("%T: invalid node: %v", origNode, origNode))
        }
    }

Trie樹的Delete方法，暫時不介紹，程式碼根插入比較類似

### Trie樹的序列化和反序列化
序列化主要是指把記憶體表示的資料存放到資料庫裡面， 反序列化是指把資料庫裡面的Trie資料載入成記憶體表示的資料。 序列化的目的主要是方便儲存，減少儲存大小等。 反序列化的目的是把儲存的資料載入到記憶體，方便Trie樹的插入，查詢，修改等需求。

Trie的序列化主要才作用了前面介紹的Compat Encoding和 RLP編碼格式。 序列化的結構在黃皮書裡面有詳細的介紹。

![image](picture/trie_8.png)
![image](picture/trie_9.png)
![image](picture/trie_10.png)

Trie樹的使用方法在trie_test.go裡面有比較詳細的參考。 這裡我列出一個簡單的使用流程。首先建立一個空的Trie樹，然後插入一些資料，最後呼叫trie.Commit()方法進行序列化並得到一個hash值(root), 也就是上圖中的KEC(c(J,0))或者是TRIE(J)。

    func TestInsert(t *testing.T) {
        trie := newEmpty()
        updateString(trie, "doe", "reindeer")
        updateString(trie, "dog", "puppy")
        updateString(trie, "do", "cat")
        root, err := trie.Commit()
    }

下面我們來分析下Commit()的主要流程。 經過一系列的呼叫，最終呼叫了hasher.go的hash方法。

    func (t *Trie) Commit() (root common.Hash, err error) {
        if t.db == nil {
            panic("Commit called on trie with nil database")
        }
        return t.CommitTo(t.db)
    }
    // CommitTo writes all nodes to the given database.
    // Nodes are stored with their sha3 hash as the key.
    //
    // Committing flushes nodes from memory. Subsequent Get calls will
    // load nodes from the trie's database. Calling code must ensure that
    // the changes made to db are written back to the trie's attached
    // database before using the trie.
    func (t *Trie) CommitTo(db DatabaseWriter) (root common.Hash, err error) {
        hash, cached, err := t.hashRoot(db)
        if err != nil {
            return (common.Hash{}), err
        }
        t.root = cached
        t.cachegen++
        return common.BytesToHash(hash.(hashNode)), nil
    }
    
    func (t *Trie) hashRoot(db DatabaseWriter) (node, node, error) {
        if t.root == nil {
            return hashNode(emptyRoot.Bytes()), nil, nil
        }
        h := newHasher(t.cachegen, t.cachelimit)
        defer returnHasherToPool(h)
        return h.hash(t.root, db, true)
    }


下面我們簡單介紹下hash方法，hash方法主要做了兩個操作。 一個是保留原有的樹形結構，並用cache變數中， 另一個是計算原有樹形結構的hash並把hash值存放到cache變數中儲存下來。

計算原有hash值的主要流程是首先呼叫h.hashChildren(n,db)把所有的子節點的hash值求出來，把原有的子節點替換成子節點的hash值。 這是一個遞迴呼叫的過程，會從樹葉依次往上計算直到樹根。然後呼叫store方法計算當前節點的hash值，並把當前節點的hash值放入cache節點，設定dirty引數為false(新建立的節點的dirty值是為true的)，然後返回。

返回值說明， cache變數包含了原有的node節點，並且包含了node節點的hash值。 hash變數返回了當前節點的hash值(這個值其實是根據node和node的所有子節點計算出來的)。

有一個小細節： 根節點呼叫hash函式的時候， force引數是為true的，其他的子節點呼叫的時候force引數是為false的。 force引數的用途是當||c(J,i)||<32的時候也對c(J,i)進行hash計算，這樣保證無論如何也會對根節點進行Hash計算。
    
    // hash collapses a node down into a hash node, also returning a copy of the
    // original node initialized with the computed hash to replace the original one.
    func (h *hasher) hash(n node, db DatabaseWriter, force bool) (node, node, error) {
        // If we're not storing the node, just hashing, use available cached data
        if hash, dirty := n.cache(); hash != nil {
            if db == nil {
                return hash, n, nil
            }
            if n.canUnload(h.cachegen, h.cachelimit) {
                // Unload the node from cache. All of its subnodes will have a lower or equal
                // cache generation number.
                cacheUnloadCounter.Inc(1)
                return hash, hash, nil
            }
            if !dirty {
                return hash, n, nil
            }
        }
        // Trie not processed yet or needs storage, walk the children
        collapsed, cached, err := h.hashChildren(n, db)
        if err != nil {
            return hashNode{}, n, err
        }
        hashed, err := h.store(collapsed, db, force)
        if err != nil {
            return hashNode{}, n, err
        }
        // Cache the hash of the node for later reuse and remove
        // the dirty flag in commit mode. It's fine to assign these values directly
        // without copying the node first because hashChildren copies it.
        cachedHash, _ := hashed.(hashNode)
        switch cn := cached.(type) {
        case *shortNode:
            cn.flags.hash = cachedHash
            if db != nil {
                cn.flags.dirty = false
            }
        case *fullNode:
            cn.flags.hash = cachedHash
            if db != nil {
                cn.flags.dirty = false
            }
        }
        return hashed, cached, nil
    }

hashChildren方法,這個方法把所有的子節點替換成他們的hash，可以看到cache變數接管了原來的Trie樹的完整結構，collapsed變數把子節點替換成子節點的hash值。

- 如果當前節點是shortNode, 首先把collapsed.Key從Hex Encoding 替換成 Compact Encoding, 然後遞迴呼叫hash方法計運算元節點的hash和cache，這樣就把子節點替換成了子節點的hash值，
- 如果當前節點是fullNode, 那麼遍歷每個子節點，把子節點替換成子節點的Hash值，
- 否則的化這個節點沒有children。直接返回。

程式碼

    // hashChildren replaces the children of a node with their hashes if the encoded
    // size of the child is larger than a hash, returning the collapsed node as well
    // as a replacement for the original node with the child hashes cached in.
    func (h *hasher) hashChildren(original node, db DatabaseWriter) (node, node, error) {
        var err error
    
        switch n := original.(type) {
        case *shortNode:
            // Hash the short node's child, caching the newly hashed subtree
            collapsed, cached := n.copy(), n.copy()
            collapsed.Key = hexToCompact(n.Key)
            cached.Key = common.CopyBytes(n.Key)
    
            if _, ok := n.Val.(valueNode); !ok {
                collapsed.Val, cached.Val, err = h.hash(n.Val, db, false)
                if err != nil {
                    return original, original, err
                }
            }
            if collapsed.Val == nil {
                collapsed.Val = valueNode(nil) // Ensure that nil children are encoded as empty strings.
            }
            return collapsed, cached, nil
    
        case *fullNode:
            // Hash the full node's children, caching the newly hashed subtrees
            collapsed, cached := n.copy(), n.copy()
    
            for i := 0; i < 16; i++ {
                if n.Children[i] != nil {
                    collapsed.Children[i], cached.Children[i], err = h.hash(n.Children[i], db, false)
                    if err != nil {
                        return original, original, err
                    }
                } else {
                    collapsed.Children[i] = valueNode(nil) // Ensure that nil children are encoded as empty strings.
                }
            }
            cached.Children[16] = n.Children[16]
            if collapsed.Children[16] == nil {
                collapsed.Children[16] = valueNode(nil)
            }
            return collapsed, cached, nil
    
        default:
            // Value and hash nodes don't have children so they're left as were
            return n, original, nil
        }
    }


store方法，如果一個node的所有子節點都替換成了子節點的hash值，那麼直接呼叫rlp.Encode方法對這個節點進行編碼，如果編碼後的值小於32，並且這個節點不是根節點，那麼就把他們直接儲存在他們的父節點裡面，否者呼叫h.sha.Write方法進行hash計算， 然後把hash值和編碼後的資料儲存到資料庫裡面，然後返回hash值。

可以看到每個值大於32的節點的值和hash都儲存到了資料庫裡面，

    func (h *hasher) store(n node, db DatabaseWriter, force bool) (node, error) {
        // Don't store hashes or empty nodes.
        if _, isHash := n.(hashNode); n == nil || isHash {
            return n, nil
        }
        // Generate the RLP encoding of the node
        h.tmp.Reset()
        if err := rlp.Encode(h.tmp, n); err != nil {
            panic("encode error: " + err.Error())
        }
    
        if h.tmp.Len() < 32 && !force {
            return n, nil // Nodes smaller than 32 bytes are stored inside their parent
        }
        // Larger nodes are replaced by their hash and stored in the database.
        hash, _ := n.cache()
        if hash == nil {
            h.sha.Reset()
            h.sha.Write(h.tmp.Bytes())
            hash = hashNode(h.sha.Sum(nil))
        }
        if db != nil {
            return hash, db.Put(hash, h.tmp.Bytes())
        }
        return hash, nil
    }


Trie的反序列化過程。還記得之前建立Trie樹的流程麼。 如果引數root的hash值不為空，那麼就會呼叫rootnode, err := trie.resolveHash(root[:], nil) 方法來得到rootnode節點。 首先從資料庫裡面通過hash值獲取節點的RLP編碼後的內容。 然後呼叫decodeNode來解析內容。

    func (t *Trie) resolveHash(n hashNode, prefix []byte) (node, error) {
        cacheMissCounter.Inc(1)
    
        enc, err := t.db.Get(n)
        if err != nil || enc == nil {
            return nil, &MissingNodeError{NodeHash: common.BytesToHash(n), Path: prefix}
        }
        dec := mustDecodeNode(n, enc, t.cachegen)
        return dec, nil
    }
    func mustDecodeNode(hash, buf []byte, cachegen uint16) node {
        n, err := decodeNode(hash, buf, cachegen)
        if err != nil {
            panic(fmt.Sprintf("node %x: %v", hash, err))
        }
        return n
    }

decodeNode方法，這個方法根據rlp的list的長度來判斷這個編碼到底屬於什麼節點，如果是2個欄位那麼就是shortNode節點，如果是17個欄位，那麼就是fullNode，然後分別呼叫各自的解析函式。

    // decodeNode parses the RLP encoding of a trie node.
    func decodeNode(hash, buf []byte, cachegen uint16) (node, error) {
        if len(buf) == 0 {
            return nil, io.ErrUnexpectedEOF
        }
        elems, _, err := rlp.SplitList(buf)
        if err != nil {
            return nil, fmt.Errorf("decode error: %v", err)
        }
        switch c, _ := rlp.CountValues(elems); c {
        case 2:
            n, err := decodeShort(hash, buf, elems, cachegen)
            return n, wrapError(err, "short")
        case 17:
            n, err := decodeFull(hash, buf, elems, cachegen)
            return n, wrapError(err, "full")
        default:
            return nil, fmt.Errorf("invalid number of list elements: %v", c)
        }
    }

decodeShort方法，通過key是否有終結符號來判斷到底是葉子節點還是中間節點。如果有終結符那麼就是葉子結點，通過SplitString方法解析出來val然後生成一個shortNode。 不過沒有終結符，那麼說明是擴充套件節點， 通過decodeRef來解析剩下的節點，然後生成一個shortNode。

    func decodeShort(hash, buf, elems []byte, cachegen uint16) (node, error) {
        kbuf, rest, err := rlp.SplitString(elems)
        if err != nil {
            return nil, err
        }
        flag := nodeFlag{hash: hash, gen: cachegen}
        key := compactToHex(kbuf)
        if hasTerm(key) {
            // value node
            val, _, err := rlp.SplitString(rest)
            if err != nil {
                return nil, fmt.Errorf("invalid value node: %v", err)
            }
            return &shortNode{key, append(valueNode{}, val...), flag}, nil
        }
        r, _, err := decodeRef(rest, cachegen)
        if err != nil {
            return nil, wrapError(err, "val")
        }
        return &shortNode{key, r, flag}, nil
    }

decodeRef方法根據資料型別進行解析，如果型別是list，那麼有可能是內容<32的值，那麼呼叫decodeNode進行解析。 如果是空節點，那麼返回空，如果是hash值，那麼構造一個hashNode進行返回，注意的是這裡沒有繼續進行解析，如果需要繼續解析這個hashNode，那麼需要繼續呼叫resolveHash方法。 到這裡decodeShort方法就呼叫完成了。

    func decodeRef(buf []byte, cachegen uint16) (node, []byte, error) {
        kind, val, rest, err := rlp.Split(buf)
        if err != nil {
            return nil, buf, err
        }
        switch {
        case kind == rlp.List:
            // 'embedded' node reference. The encoding must be smaller
            // than a hash in order to be valid.
            if size := len(buf) - len(rest); size > hashLen {
                err := fmt.Errorf("oversized embedded node (size is %d bytes, want size < %d)", size, hashLen)
                return nil, buf, err
            }
            n, err := decodeNode(nil, buf, cachegen)
            return n, rest, err
        case kind == rlp.String && len(val) == 0:
            // empty node
            return nil, rest, nil
        case kind == rlp.String && len(val) == 32:
            return append(hashNode{}, val...), rest, nil
        default:
            return nil, nil, fmt.Errorf("invalid RLP string size %d (want 0 or 32)", len(val))
        }
    }

decodeFull方法。根decodeShort方法的流程差不多。

    
    func decodeFull(hash, buf, elems []byte, cachegen uint16) (*fullNode, error) {
        n := &fullNode{flags: nodeFlag{hash: hash, gen: cachegen}}
        for i := 0; i < 16; i++ {
            cld, rest, err := decodeRef(elems, cachegen)
            if err != nil {
                return n, wrapError(err, fmt.Sprintf("[%d]", i))
            }
            n.Children[i], elems = cld, rest
        }
        val, _, err := rlp.SplitString(elems)
        if err != nil {
            return n, err
        }
        if len(val) > 0 {
            n.Children[16] = append(valueNode{}, val...)
        }
        return n, nil
    }


### Trie樹的cache管理
Trie樹的cache管理。 還記得Trie樹的結構裡面有兩個引數， 一個是cachegen,一個是cachelimit。這兩個引數就是cache控制的引數。 Trie樹每一次呼叫Commit方法，會導致當前的cachegen增加1。
    
    func (t *Trie) CommitTo(db DatabaseWriter) (root common.Hash, err error) {
        hash, cached, err := t.hashRoot(db)
        if err != nil {
            return (common.Hash{}), err
        }
        t.root = cached
        t.cachegen++
        return common.BytesToHash(hash.(hashNode)), nil
    }

然後在Trie樹插入的時候，會把當前的cachegen存放到節點中。

    func (t *Trie) insert(n node, prefix, key []byte, value node) (bool, node, error) {
                ....
                return true, &shortNode{n.Key, nn, t.newFlag()}, nil
            }

    // newFlag returns the cache flag value for a newly created node.
    func (t *Trie) newFlag() nodeFlag {
        return nodeFlag{dirty: true, gen: t.cachegen}
    }

如果 trie.cachegen - node.cachegen > cachelimit，就可以把節點從記憶體裡面解除安裝掉。 也就是說節點經過幾次Commit，都沒有修改，那麼就把節點從記憶體裡面解除安裝，以便節約記憶體給其他節點使用。

解除安裝過程在我們的 hasher.hash方法中， 這個方法是在commit的時候呼叫。如果方法的canUnload方法呼叫返回真，那麼就解除安裝節點，觀察他的返回值，只返回了hash節點，而沒有返回node節點，這樣節點就沒有引用，不久就會被gc清除掉。 節點被解除安裝掉之後，會用一個hashNode節點來表示這個節點以及其子節點。 如果後續需要使用，可以通過方法把這個節點載入到記憶體裡面來。

    func (h *hasher) hash(n node, db DatabaseWriter, force bool) (node, node, error) {
        if hash, dirty := n.cache(); hash != nil {
            if db == nil {
                return hash, n, nil
            }
            if n.canUnload(h.cachegen, h.cachelimit) {
                // Unload the node from cache. All of its subnodes will have a lower or equal
                // cache generation number.
                cacheUnloadCounter.Inc(1)
                return hash, hash, nil
            }
            if !dirty {
                return hash, n, nil
            }
        }

canUnload方法是一個介面，不同的node呼叫不同的方法。

    // canUnload tells whether a node can be unloaded.
    func (n *nodeFlag) canUnload(cachegen, cachelimit uint16) bool {
        return !n.dirty && cachegen-n.gen >= cachelimit
    }
    
    func (n *fullNode) canUnload(gen, limit uint16) bool  { return n.flags.canUnload(gen, limit) }
    func (n *shortNode) canUnload(gen, limit uint16) bool { return n.flags.canUnload(gen, limit) }
    func (n hashNode) canUnload(uint16, uint16) bool      { return false }
    func (n valueNode) canUnload(uint16, uint16) bool     { return false }
    
    func (n *fullNode) cache() (hashNode, bool)  { return n.flags.hash, n.flags.dirty }
    func (n *shortNode) cache() (hashNode, bool) { return n.flags.hash, n.flags.dirty }
    func (n hashNode) cache() (hashNode, bool)   { return nil, true }
    func (n valueNode) cache() (hashNode, bool)  { return nil, true }

### proof.go Trie樹的默克爾證明
主要提供兩個方法，Prove方法獲取指定Key的proof證明， proof證明是從根節點到葉子節點的所有節點的hash值列表。 VerifyProof方法，接受一個roothash值和proof證明和key來驗證key是否存在。

Prove方法，從根節點開始。把經過的節點的hash值一個一個存入到list中。然後返回。
    
    // Prove constructs a merkle proof for key. The result contains all
    // encoded nodes on the path to the value at key. The value itself is
    // also included in the last node and can be retrieved by verifying
    // the proof.
    //
    // If the trie does not contain a value for key, the returned proof
    // contains all nodes of the longest existing prefix of the key
    // (at least the root node), ending with the node that proves the
    // absence of the key.
    func (t *Trie) Prove(key []byte) []rlp.RawValue {
        // Collect all nodes on the path to key.
        key = keybytesToHex(key)
        nodes := []node{}
        tn := t.root
        for len(key) > 0 && tn != nil {
            switch n := tn.(type) {
            case *shortNode:
                if len(key) < len(n.Key) || !bytes.Equal(n.Key, key[:len(n.Key)]) {
                    // The trie doesn't contain the key.
                    tn = nil
                } else {
                    tn = n.Val
                    key = key[len(n.Key):]
                }
                nodes = append(nodes, n)
            case *fullNode:
                tn = n.Children[key[0]]
                key = key[1:]
                nodes = append(nodes, n)
            case hashNode:
                var err error
                tn, err = t.resolveHash(n, nil)
                if err != nil {
                    log.Error(fmt.Sprintf("Unhandled trie error: %v", err))
                    return nil
                }
            default:
                panic(fmt.Sprintf("%T: invalid node: %v", tn, tn))
            }
        }
        hasher := newHasher(0, 0)
        proof := make([]rlp.RawValue, 0, len(nodes))
        for i, n := range nodes {
            // Don't bother checking for errors here since hasher panics
            // if encoding doesn't work and we're not writing to any database.
            n, _, _ = hasher.hashChildren(n, nil)
            hn, _ := hasher.store(n, nil, false)
            if _, ok := hn.(hashNode); ok || i == 0 {
                // If the node's database encoding is a hash (or is the
                // root node), it becomes a proof element.
                enc, _ := rlp.EncodeToBytes(n)
                proof = append(proof, enc)
            }
        }
        return proof
    }

VerifyProof方法，接收一個rootHash引數，key引數，和proof陣列， 來一個一個驗證是否能夠和資料庫裡面的能夠對應上。

    // VerifyProof checks merkle proofs. The given proof must contain the
    // value for key in a trie with the given root hash. VerifyProof
    // returns an error if the proof contains invalid trie nodes or the
    // wrong value.
    func VerifyProof(rootHash common.Hash, key []byte, proof []rlp.RawValue) (value []byte, err error) {
        key = keybytesToHex(key)
        sha := sha3.NewKeccak256()
        wantHash := rootHash.Bytes()
        for i, buf := range proof {
            sha.Reset()
            sha.Write(buf)
            if !bytes.Equal(sha.Sum(nil), wantHash) {
                return nil, fmt.Errorf("bad proof node %d: hash mismatch", i)
            }
            n, err := decodeNode(wantHash, buf, 0)
            if err != nil {
                return nil, fmt.Errorf("bad proof node %d: %v", i, err)
            }
            keyrest, cld := get(n, key)
            switch cld := cld.(type) {
            case nil:
                if i != len(proof)-1 {
                    return nil, fmt.Errorf("key mismatch at proof node %d", i)
                } else {
                    // The trie doesn't contain the key.
                    return nil, nil
                }
            case hashNode:
                key = keyrest
                wantHash = cld
            case valueNode:
                if i != len(proof)-1 {
                    return nil, errors.New("additional nodes at end of proof")
                }
                return cld, nil
            }
        }
        return nil, errors.New("unexpected end of proof")
    }
    
    func get(tn node, key []byte) ([]byte, node) {
        for {
            switch n := tn.(type) {
            case *shortNode:
                if len(key) < len(n.Key) || !bytes.Equal(n.Key, key[:len(n.Key)]) {
                    return nil, nil
                }
                tn = n.Val
                key = key[len(n.Key):]
            case *fullNode:
                tn = n.Children[key[0]]
                key = key[1:]
            case hashNode:
                return key, n
            case nil:
                return key, nil
            case valueNode:
                return nil, n
            default:
                panic(fmt.Sprintf("%T: invalid node: %v", tn, tn))
            }
        }
    }


### security_trie.go 加密的Trie
為了避免刻意使用很長的key導致訪問時間的增加， security_trie包裝了一下trie樹， 所有的key都轉換成keccak256演算法計算的hash值。同時在資料庫裡面儲存hash值對應的原始的key。

    type SecureTrie struct {
        trie             Trie    //原始的Trie樹
        hashKeyBuf       [secureKeyLength]byte   //計算hash值的buf
        secKeyBuf        [200]byte               //hash值對應的key儲存的時候的資料庫字首
        secKeyCache      map[string][]byte      //記錄hash值和對應的key的對映
        secKeyCacheOwner *SecureTrie // Pointer to self, replace the key cache on mismatch
    }

    func NewSecure(root common.Hash, db Database, cachelimit uint16) (*SecureTrie, error) {
        if db == nil {
            panic("NewSecure called with nil database")
        }
        trie, err := New(root, db)
        if err != nil {
            return nil, err
        }
        trie.SetCacheLimit(cachelimit)
        return &SecureTrie{trie: *trie}, nil
    }
    
    // Get returns the value for key stored in the trie.
    // The value bytes must not be modified by the caller.
    func (t *SecureTrie) Get(key []byte) []byte {
        res, err := t.TryGet(key)
        if err != nil {
            log.Error(fmt.Sprintf("Unhandled trie error: %v", err))
        }
        return res
    }
    
    // TryGet returns the value for key stored in the trie.
    // The value bytes must not be modified by the caller.
    // If a node was not found in the database, a MissingNodeError is returned.
    func (t *SecureTrie) TryGet(key []byte) ([]byte, error) {
        return t.trie.TryGet(t.hashKey(key))
    }
    func (t *SecureTrie) CommitTo(db DatabaseWriter) (root common.Hash, err error) {
        if len(t.getSecKeyCache()) > 0 {
            for hk, key := range t.secKeyCache {
                if err := db.Put(t.secKey([]byte(hk)), key); err != nil {
                    return common.Hash{}, err
                }
            }
            t.secKeyCache = make(map[string][]byte)
        }
        return t.trie.CommitTo(db)
    }



網址：http://www.qukuailianxueyuan.io/


欲領取造幣技術與全套虛擬機器資料
區塊鏈技術交流QQ群：756146052  備註：CSDN
尹成學院微信：備註：CSDN
以太坊原始碼分析(52）以太坊fast sync演算法
2018-05-14
原始碼AST演算法
以太坊原始碼分析(36)ethdb原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(38）event原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(41）hashimoto原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(43）node原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(51）rpc原始碼分析
2018-05-14
原始碼RPC
以太坊原始碼分析(13)RPC分析
2018-05-13
原始碼RPC
以太坊原始碼分析(35)eth-fetcher原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(20)core-bloombits原始碼分析
2018-05-14
原始碼OOM
以太坊原始碼分析(24)core-state原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(29)core-vm原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(34)eth-downloader原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(37)eth以太坊協議分析
2018-05-14
原始碼協議
以太坊原始碼分析(18)以太坊交易執行分析
2018-05-13
原始碼
以太坊原始碼分析(5)accounts程式碼分析
2018-05-13
原始碼
以太坊交易池原始碼分析
2020-10-15
原始碼
以太坊原始碼分析(23)core-state-process原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(31)eth-downloader-peer原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(32)eth-downloader-peer原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(33)eth-downloader-statesync原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(8)區塊分析
2018-05-13
原始碼
以太坊原始碼分析(9)cmd包分析
2018-05-13
原始碼
以太坊原始碼分析(16)挖礦分析
2018-05-13
原始碼
以太坊原始碼分析(26)core-txpool交易池原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(27)core-vm-jumptable-instruction原始碼分析
2018-05-14
原始碼Struct
以太坊原始碼分析(28)core-vm-stack-memory原始碼分析
2018-05-14
原始碼
以太坊原始碼分析(30)eth-bloombits和filter原始碼分析
2018-05-14
原始碼OOMFilter
以太坊原始碼分析(10)CMD深入分析
2018-05-13
原始碼
以太坊原始碼分析(12)交易資料分析
2018-05-13
原始碼
以太坊原始碼分析(19)core-blockchain分析
2018-05-14
原始碼Blockchain
以太坊原始碼分析(44）p2p-database.go原始碼分析
2018-05-14
原始碼DatabaseGo
以太坊原始碼分析(45）p2p-dial.go原始碼分析
2018-05-14
原始碼Go
以太坊原始碼分析(46）p2p-peer.go原始碼分析
2018-05-14
原始碼Go
以太坊原始碼分析(48）p2p-server.go原始碼分析
2018-05-14
原始碼ServerGo
以太坊原始碼分析(49）p2p-table.go原始碼分析
2018-05-14
原始碼Go
以太坊原始碼分析(50）p2p-udp.go原始碼分析
2018-05-14
原始碼UDPGo
以太坊原始碼分析(39）geth啟動流程分析
2018-05-14
原始碼
以太坊原始碼分析(6)accounts賬戶管理分析
2018-05-13
原始碼
以太坊原始碼分析(52）trie原始碼分析

相關文章