原始碼的魅力 – HashMap 的工作原理

Nichoool發表於2019-03-04

原文網址 : https://flycode.co/archives/281541

HashMap 的工作原理（Android7.1原始碼）

其他相關文章

簡介

HashMap 是以雜湊表形式實現的Map (Key-Value)

初始化

   ...
   //儲存資料的雜湊表
   transient HashMapEntry<K，V>[] table = (HashMapEntry<K，V>[]) EMPTY_TABLE;
   ...
   //實際存在的size個數
   transient int size;
   ...
   //table擴充套件的閾值
   int threshold;
   //HashMap建構函式中並沒有對table分配空間 而是使用EMPTY_TABLE
   public HashMap(int initialCapacity， float loadFactor) {
           if (initialCapacity < 0)
               throw new IllegalArgumentException("Illegal initial capacity: " +
                                                  initialCapacity);
           if (initialCapacity > MAXIMUM_CAPACITY) {
               initialCapacity = MAXIMUM_CAPACITY;
           } else if (initialCapacity < DEFAULT_INITIAL_CAPACITY) {
               initialCapacity = DEFAULT_INITIAL_CAPACITY;
           }

           if (loadFactor <= 0 || Float.isNaN(loadFactor))
               throw new IllegalArgumentException("Illegal load factor: " +
                                                  loadFactor);
           // Android-Note: We always use the default load factor of 0.75f.

           // This might appear wrong but it`s just awkward design. We always call
           // inflateTable() when table == EMPTY_TABLE. That method will take "threshold"
           // to mean "capacity" and then replace it with the real threshold (i.e， multiplied with
           // the load factor).
           //註釋的意思是在當table為空(也就是當前，剛建立的HashMap就是一個空列表)時inflateTable中會對table雜湊表進行分配空間
           threshold = initialCapacity;
           //空實現
           init();
       }複製程式碼

新建立的HashMap並沒有對table雜湊表分配記憶體空間，在後面的put方法中我們將分析具體分配空間的位置以及函式.雜湊表的儲存元素是HashMapEntry。

    /** @hide */  // Android added.
    static class HashMapEntry<K，V> implements Map.Entry<K，V> {
        final K key;
        V value;
        HashMapEntry<K，V> next;
        int hash;

   }複製程式碼

除了Key與Value值之外還有HashMapEntry的引用，這裡先簡單介紹下這個next值，它連結的物件將會是一個連結串列的Head或者紅黑樹的Head，它就是解決HashMap衝突的方法之一 – 連結法。

put 方法

  public V put(K key， V value) {
      if (table == EMPTY_TABLE) {
          inflateTable(threshold);
      }
      if (key == null)
          return putForNullKey(value);
      int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);
      int i = indexFor(hash， table.length);
      for (HashMapEntry<K，V> e = table[i]; e != null; e = e.next) {
          Object k;
          if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
              V oldValue = e.value;
              e.value = value;
              e.recordAccess(this);
              return oldValue;
          }
      }

      modCount++;
      addEntry(hash， key， value， i);
      return null;
  }複製程式碼

當table為空時，第一次使用put方法時會觸發這個table雜湊表的初始化。
當key是空時，將會插入value，並返回老的資料

通過singleWordWangJenkinsHash方法來獲取HashCode.

    ...
    public static int singleWordWangJenkinsHash(Object k) {
        int h = k.hashCode();
        h += (h <<  15) ^ 0xffffcd7d;
        h ^= (h >>> 10);
        h += (h <<   3);
        h ^= (h >>>  6);
        h += (h <<   2) + (h << 14);
        return h ^ (h >>> 16);
    }複製程式碼

實質是通過key的hashCode()，然後再處理得到hash值，這個hash的值很重要。

通過indexFor方法計算得出當前資料在table雜湊表的索引位置
```
static int indexFor(int h， int length) {
  // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
  return h & (length-1);
}複製程式碼
```
依舊是通過`與`運算來高效計算出索引值，由於length永遠是2的倍數或者是0，所以在這裡位運算提高了速度（通過與上length – 1可以快速計算出index）。
一般HashMap的平均查詢資料時間複雜度(1)，這一個優點主要得益於這個Hashcode的計算，為了更低的衝突率，在前面的singleWordWangJenkinsHash函式中最後一步的h ^ (h >>> 16)，就是將h的高16與低16位異或操作，讓hash值盡力不會出現部分位數相同的情況，讓indexFor計算更加平均，每一個值對應一個index，減少衝突率。

通過獲得的index在雜湊表指定位置找到HashMapEntry，由於HashMap是使用連結法來解決衝突的，所以如果出現衝突(也就是不同的key得到的index相同)，通過上面我們講的next值向下查詢如果找到一樣的資料，則替換並返回，如果不存在則在此處新增資料。

void addEntry(int hash， K key， V value， int bucketIndex) {
   if ((size >= threshold) && (null != table[bucketIndex])) {
       resize(2 * table.length);
       hash = (null != key) ? sun.misc.Hashing.singleWordWangJenkinsHash(key) : 0;
       bucketIndex = indexFor(hash， table.length);
   }

   createEntry(hash， key， value， bucketIndex);
}
//擴充table雜湊表
void resize(int newCapacity) {
   HashMapEntry[] oldTable = table;
   int oldCapacity = oldTable.length;
   if (oldCapacity == MAXIMUM_CAPACITY) {
       threshold = Integer.MAX_VALUE;
       return;
   }

   HashMapEntry[] newTable = new HashMapEntry[newCapacity];
   transfer(newTable);
   table = newTable;
   threshold = (int)Math.min(newCapacity * loadFactor， MAXIMUM_CAPACITY + 1);
}
//將老列表中的資料插入到新資料表中
void transfer(HashMapEntry[] newTable) {
    int newCapacity = newTable.length;
    for (HashMapEntry<K，V> e : table) {
        //此處e代表table中的Entry
        while(null != e) {
            //這個while迴圈是如果Entry含有next值，將會順著next向下查詢
            HashMapEntry<K，V> next = e.next;
            //計算在新table中的index
            int i = indexFor(e.hash， newCapacity);
            //將當前Entry拷貝到新位置前如果那個位置存在資料
            //則儲存到Entry的next中
            e.next = newTable[i];
            //移到新位置
            newTable[i] = e;
            e = next;
        }
    }
}複製程式碼

呼叫addEntry方法
- 當資料數量達到閾值則要擴充套件成原先的兩倍
- 在resize函式中，當列表的大小已經是最大值，設定閾值為integer的最大值，不再擴充套件
- 生成一個新的表，然後執行transfer將老表中的資料轉換到新表中去。
- 在transfer函式中，先遍歷老表table，找出已經有資料的Entry，重新通過indexFor計算在新表中的index，將原先的entry移到新位置，如果原先資料中存在next值則繼續順著next進行移動資料。transfer函式不僅僅是擴充套件雜湊表大小那麼簡單，通過transfer這一步可以將原先已經存在的衝突均勻分散開，這一步可以提高當前HashMap的獲取資料的速度，重點就在indexFor方法中的與操作，待會我將來分析為何起到這個作用
- transfer完資料後，更新閾值.
- 結束了resize方法後，重新計算bucketIndex，然後通過createEntry來插入資料.

indexFor 的神奇作用

  //計算公式
  h & (length-1);

  //假設一個Key的hashCode是  0000 0000 0000 0000 0000 0001 1111 1111
  //另一個Key的hashCode是    0000 0000 0000 0000 0000 0000 1111 1111
  //length正好是256 也就是2的八次方                    0001 0000 0000
  //length - 1等於                                   0000 1111 1111
  //執行與運算
  //兩個index一樣 index1 = index2 = 255;
  //當resize()函式執行翻倍時
  //length正好是512 也就是2的九次方                    0010 0000 0000
  //length - 1等於                                   0001 1111 1111
  //執行與運算
  //index1 等於                                      0001 1111 1111
  //index2 等於                                      0000 1111 1111
  //兩者不相等了複製程式碼

通過上面的註釋的計算介紹，可以很清晰的看到原本衝突的兩個key，通過擴充後，並且只要一個indexFor的函式，執行相與操作就可以將衝突完美化解。

get 方法

  public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K，V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }


    final Entry<K，V> getEntry(Object key) {
          if (size == 0) {
              return null;
          }
        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
        for (HashMapEntry<K，V> e = table[indexFor(hash， table.length)]; e != null; e = e.next) {
              Object k;
              if (e.hash == hash &&
                  ((k = e.key) == key || (key != null && key.equals(k))))
                  return e;
          }
          return null;
      }複製程式碼

get方法很簡單，就是通過key直接找資料，第一步獲取hash值，通過indexFor獲取資料在陣列的位置，然後遍歷，如果沒有衝突的話，直接可以獲取到資料並退出遍歷，存在衝突就需要在next的連結串列中查詢。

衝突的壞處

這個標題段也許是多餘的吧，希望不明白的人可以知道下吧。
HashMap的高效率依靠的就是通過HashCode雜湊式插入到表的不同位置，當不存在衝突的時候，get()查詢可以是(1)的時間複雜度，直接就可以取到資料，如果存在衝突就必須沿著next的連結串列一個一個查詢比對，效率大大降低。

題外話

Java8中的HashMap原始碼中，在解決衝突部分，使用了紅黑樹與連結串列替換使用的方式來管理衝突的資料，提高衝突時的get(object)搜尋速度，當衝突資料少時用連結串列，大時使用紅黑樹。
總是HashMap是出了名的用空間換時間的資料結構，也是常用的資料結構，但是記憶體使用率低是它致命的弱點，為此Android有一個ArrayMap資料結構在一定程度上來替代它，下面的章節中我將分析ArrayMap這個資料結構，講解什麼時候使用ArrayMap什麼時候使用HashMap。

更多好文章請關注微信公眾號【Android技術棧】，獵豹移動大牛入駐公眾號將提供給你更好的技術心得，公眾號才剛剛起步希望大家多多支援。

【JDK原始碼分析】淺談HashMap的原理
2020-03-07
JDK原始碼HashMap
乾貨：HashMap的工作原理解析
2019-03-04
HashMap
java面試題-HashMap的工作原理
2019-02-18
Java面試題HashMap
HashMap實現原理及原始碼分析
2018-07-30
HashMap原始碼
HashMap 實現原理與原始碼分析
2019-04-26
HashMap原始碼
HashMap中面試常問的工作原理
2021-09-09
HashMap面試
從pytest原始碼的角度分析pytest工作原理
2024-07-30
原始碼
petite-vue原始碼剖析-ref的工作原理
2022-03-15
Vue原始碼
Composer 工作原理 [原始碼分析]
2020-04-17
原始碼
從原始碼的角度來談一談HashMap的內部實現原理
2018-08-19
原始碼HashMap
springmvc工作原理及原始碼分析
2018-12-20
SpringMVC原始碼
原始碼分析——HashMap
2019-06-12
原始碼HashMap
HashMap 原始碼分析
2022-03-07
HashMap原始碼
HashMap原始碼剖析
2021-09-24
HashMap原始碼
HashMap原始碼整理
2021-01-02
HashMap原始碼
HashMap原始碼分析
2020-12-15
HashMap原始碼
從SpringMvc原始碼分析其工作原理
2019-04-18
SpringMVC原始碼
從原始碼角度分析 MyBatis 工作原理
2021-09-07
原始碼MyBatis
Jdk1.7下的HashMap原始碼分析
2020-08-12
JDKHashMap原始碼
Jdk1.8下的HashMap原始碼分析
2020-08-11
JDKHashMap原始碼
HashMap的底層原理
2021-05-15
HashMap
Java——HashMap原始碼解析
2018-12-11
JavaHashMap原始碼
原始碼分析之 HashMap
2019-03-04
原始碼HashMap
原始碼閱讀-HashMap
2018-08-15
原始碼HashMap
Java:HashMap原始碼分析
2018-03-11
JavaHashMap原始碼
HashMap原始碼詳解
2023-11-03
HashMap原始碼
搞懂 Java HashMap 原始碼
2018-04-07
JavaHashMap原始碼
HashMap原始碼解讀
2018-03-23
HashMap原始碼
HashMap 原始碼閱讀
2021-09-09
HashMap原始碼
學習HashMap原始碼
2022-03-20
HashMap原始碼
HashMap原始碼閱讀
2020-11-26
HashMap原始碼
原始碼|jdk原始碼之HashMap分析(一)
2019-01-19
原始碼JDKHashMap
原始碼|jdk原始碼之HashMap分析(二)
2019-01-19
原始碼JDKHashMap
JVMTI Agent 工作原理及核心原始碼分析
2018-05-26
JVM原始碼
Struts2 原始碼分析-----工作原理分析
2019-05-15
原始碼
Spring MVC的工作原理，我們來看看其原始碼實現
2019-06-06
SpringMVC原始碼
petite-vue原始碼剖析-v-if和v-for的工作原理
2022-03-07
Vue原始碼
petite-vue原始碼剖析-事件繫結`v-on`的工作原理
2022-03-16
Vue原始碼事件