從原始碼來聊一聊hashmap

glmapper發表於2017-12-03

HashMap為什麼會是面試中的常客呢？我覺得有以下幾點原因：
* 考察你閱讀原始碼的能力
* 是否瞭解內部資料結構
* 是否瞭解其儲存和查詢邏輯
* 對非執行緒安全情況下的使用考慮
前段時間一同事面試螞蟻金服，就被問到了這個問題；其實很多情況下都是從hashMap,hashTable,ConcurrentHahMap三者之間的關係衍生而出，當然也有直接就針對hashMap原理直接進行考察的。實際上本質都一樣，就是為了考察你是否對集合中這些常用集合的原理、實現和使用場景是否清楚。一方面是我們開發中用的多，當然用的人也就多，但是用的好的人卻不多（我也用的多，用的也不好）。所以就藉此機會（強行蹭一波）再來捋一捋這個HashMap。本文基於jdk1.7.0_80；jdk 1.8之後略有改動，這個後面細說。

繼承關係

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable
複製程式碼

hashMap實現了Map、Cloneable、Serializable三個介面，並且繼承了AbstractMap這個抽象類。hashTable繼承的是Dictionary這個類，同時也實現了Map、Cloneable、Serializable三個介面。

主要屬性

DEFAULT_INITIAL_CAPACITY 預設初始容量 16 （hashtable 是11）常量

 /**
     * The default initial capacity - MUST be a power of two.
     * 預設初始容量-必須是2的冪。
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
複製程式碼

MAXIMUM_CAPACITY 預設最大容量常量

/**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     *如果有一個更大的值被用於構造HashMap,則使用最大值
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;
複製程式碼

DEFAULT_LOAD_FACTOR 負載因子(預設0.75) 常量

/**
     * The load factor used when none specified in constructor.
     * 載入因子，如果建構函式中沒有指定，則使用預設的
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
複製程式碼

EMPTY_TABLE 預設的空表

/**
     * An empty table instance to share when the table is not inflated.
     * 當表不膨脹時共享的空表例項。
     */
    static final Entry<?,?>[] EMPTY_TABLE = {};
複製程式碼

table 表，必要時調整大小。長度必須是兩個冪。這個也是hashmap中的核心的儲存結構

/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
複製程式碼

size 表示HashMap中存放KV的數量（為連結串列/樹中的KV的總和）

/**
     * The number of key-value mappings contained in this map.
     */
    transient int size;
複製程式碼

threshold 擴容變數，表示當HashMap的size大於threshold時會執行resize操作。 threshold=capacity*loadFactor

/**
     * The next size value at which to resize (capacity * load factor).
     * @serial
     */
    // If table == EMPTY_TABLE then this is the initial capacity at which the
    // table will be created when inflated.
    int threshold;
複製程式碼

loadFactor 負載因子負載因子用來衡量HashMap滿的程度。loadFactor的預設值為0.75f。計算HashMap的實時裝載因子的方法為：size/capacity，而不是佔用桶的數量去除以capacity。（桶的概念後續介紹）

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;
複製程式碼

modCount 這個HashMap的結構修改的次數是那些改變HashMap中的對映數量或修改其內部結構(例如rehash)的那些。這個欄位用於使迭代器對HashMap失敗快速的集合檢視。(見ConcurrentModificationException)。

/**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;
複製程式碼

hashSeed 與此例項相關聯的隨機值，用於雜湊鍵的雜湊程式碼，使雜湊衝突更難找到。如果0，那麼替代雜湊是禁用的。

/**
     * A randomizing value associated with this instance that is applied to
     * hash code of keys to make hash collisions harder to find. If 0 then
     * alternative hashing is disabled.
     */
    transient int hashSeed = 0;
複製程式碼

結構分析

static class Entry<K,V> implements Map.Entry<K,V>
複製程式碼

hashmap中是通過使用一個繼承自Map中內部類Entry的Entry靜態內部類來儲存每一個K-V值的。看下具體程式碼：

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key; //鍵物件
        V value;     //值物件
        Entry<K,V> next; //指向連結串列中下一個Entry物件，可為null，表示當前Entry物件在連結串列尾部
        int hash;    //鍵物件的hash值

        /**
         * 構造物件
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
        /**
        * 獲取key
        */
        public final K getKey() {
            return key;
        }
        /**
        * 獲取value
        */
        public final V getValue() {
            return value;
        }
        /**
        * 設定value，這裡返回的是oldValue(這個不太明白，哪位大佬清楚的可以留言解釋下，非常感謝)
        */
        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
        /**
        * 重寫equals方法
        */
        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }
        /**
        * 重寫hashCode方法
        */
        public final int hashCode() {
            return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }
複製程式碼

HashMap是一個用於儲存Key-Value鍵值對的集合，每一個鍵值對也叫做Entry。這些個鍵值對（Entry）分散儲存在一個陣列當中，這個陣列就是HashMap的主幹（也就是上面的table--桶）。看一張圖：

hashmap初始化時各個空間的預設值為null，當插入元素時（具體插入下面分析），根據key值來計算出具體的索引位置，如果重複，則使用尾插入法進行插入後面連結串列中。

尾插法
之前我是通過插入17條資料來試驗的（具體資料數目隨意，越大重複的機率越高）

public static void main(String[] args) throws Exception {
		HashMap<String, Object> map=new HashMap<>();
		for (int i = 0; i < 170; i++) {
			map.put("key"+i, i);
		}
		System.out.println(map);
	}
複製程式碼

通過斷點檢視next，可以得出我們上面的結論：
1.索引衝突時會使用連結串列來儲存； 2.插入連結串列的方式是從尾部開始插入的（官方的解釋是一般情況下，後來插入的資料被使用的頻次較高），這樣的話有利於查詢。

主要方法

我們平時在開發是最常用的hashMap中的方法無非就是先建立一個HashMap物件，然後存，接著取；對應的方法就是：

建構函式
put函式
get函式

建構函式

 /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity 指定的初始化容量大小
     * @param  loadFactor      the load factor 指定的負載因子
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        //如果初始化容量小於0，則丟擲異常
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        //如果初始化容量大於最大容量，則使用預設最大容量
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
       //如果負載因子小於0或者非數值型別，則丟擲異常
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        //初始化負載因子
        this.loadFactor = loadFactor;
        //初始化threshold
        threshold = initialCapacity;
        //這個初始化方法是個空方法，應該是意在HashMap的子類中由使用者自行重寫該方法的具體實現
        init();
    }
複製程式碼

另外兩個構造方法實際上都是對上面這個構造方法的呼叫：

//只制定預設容量
 public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
 }
 //使用HashMap預設的容量大小和負載因子
 public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
 }
複製程式碼

還有一個是：

public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                      DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        inflateTable(threshold);

        putAllForCreate(m);
    }
複製程式碼

構造一個對映關係與指定 Map 相同的新 HashMap。所建立的 HashMap 具有預設載入因子 (0.75) 和足以容納指定 Map 中對映關係的初始容量。

put方法
首先，我們都知道hashmap中的key是允許為null的，這一點也是面試中最常問到的點。那我先看下為什麼可以存null作為key值。

public V put(K key, V value) {
        //如果table是空的
        if (table == EMPTY_TABLE) {
            //inflate：擴容/膨脹的意思
            inflateTable(threshold);
        }
        //如果key為null 此處敲下桌子，為什麼可以存null？
        if (key == null)
            //執行putForNullKey方法，這個方法的作用是如果key為null，就將當前的k-v存放到table[0],即第一個桶。
            return putForNullKey(value);
        //對key進行一次hash運算，獲取hash值
        int hash = hash(key);
        //根據key值得hash值和表的長度來計算索引位置
        int i = indexFor(hash, table.length);
        //移動資料，插入資料
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                //上面Entry中的setValue中也有提到，返回的都是舊的資料
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
複製程式碼

hash方法：檢索物件雜湊程式碼，並將附加雜湊函式應用於結果雜湊，該雜湊函式防止質量差的雜湊函式。這是至關重要的，因為HashMap使用兩個長度的雜湊表，否則會碰到hashCode的衝突，這些hashCodes在低位上沒有區別。注意：空鍵總是對映到雜湊0，因此索引為0。

/**
    final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();

        //這個函式確保在每個位元位置上僅以恆定倍數不同
        //的雜湊碼具有有限數量的衝突（在預設載入因子下大約為8）。
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
複製程式碼

衝突具體過程描述：

一個空的hashmap表
插入元素，通過hash計算得出索引為3，因為當前3的位置沒有元素，因此直接插入進去即可
再次插入元素，通過hash計算得出索引還是3，發生衝突，則將當前新插入的元素放在原來的已有的元素位置，並將其next指向原來已經存在的元素。
get方法
返回指定鍵對映到的值;如果此對映不包含鍵對映，則返回null。

 public V get(Object key) {
        //和存null key一樣，取的時候也是從table[0]取
        if (key == null)
            return getForNullKey();
        //獲取entry
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }
複製程式碼

getEntry方法

 final Entry<K,V> getEntry(Object key) {
        //size等於0，說明當前hashMap中沒有元素，直接返回null（每個entry預設值為null）
        if (size == 0) {
            return null;
        }
        //根據key值計算hash值
        int hash = (key == null) ? 0 : hash(key);
        //通過hash值獲取到索引位置，找到對應的桶鏈進行遍歷查詢
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            //如果找到則返回，如果沒有連結串列指標移動到下一個節點繼續查詢。
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }
複製程式碼

擴容機制

在前面提到過threshold，擴容變數，表示當HashMap的size大於threshold時會執行resize操作。其計算方式是：threshold=capacity*loadFactor。從上面的式子中我們可以得知hashmap的擴容時機是當前當前size的值超過容量乘以負載因子時就會觸發擴容。來看下原始碼：

void addEntry(int hash, K key, V value, int bucketIndex) {
        //如果當前size超過threshold 並且滿足桶索引位置不為null的情況下，擴容
        if ((size >= threshold) && (null != table[bucketIndex])) {
           //擴容之後為原來的兩倍
            resize(2 * table.length);
            //重新計算hash值
            hash = (null != key) ? hash(key) : 0;
            //重寫計算索引
            bucketIndex = indexFor(hash, table.length);
        }
        //執行具體的插入操作
        createEntry(hash, key, value, bucketIndex);
    }

void createEntry(int hash, K key, V value, int bucketIndex) {
        //先取到當前桶的entry
        Entry<K,V> e = table[bucketIndex];
        //將新的資料插入到table[bucketIndex]，再將之前的entry通過連結串列簡介到table[bucketIndex]的next指向；前面的圖已經進行了描述。
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }
複製程式碼

需要注意的是，擴容並不是在hashmap滿了之後才進行的，看下面斷點：