jdk原始碼分析之ConcurrentHashMap

王世暉發表於2016-06-01

基本原理

Hashtable使用synchronized鎖住整張Hash表,鎖的粒度太大導致Hashtable效能低下。ConcurrentHashMap允許多個修改操作併發進行,其關鍵在於使用了鎖分離技術。它使用了多個鎖來控制對hash表的不同部分進行的修改。ConcurrentHashMap內部使用段(Segment)來表示這些不同的部分,每個段其實就是一個小的hash table,它們有自己的鎖。只要多個修改操作發生在不同的段上,它們就可以併發進行。

如何定位段

ConcurrentHashMap通過key的高位,將鍵值對分配到不同的段中
ConcurrentHashMap有兩個屬性和一個方法用來定位段

    /**
     * Mask value for indexing into segments. The upper bits of a
     * key's hash code are used to choose the segment.
     */
    final int segmentMask;

    /**
     * Shift value for indexing within segments.
     */
    final int segmentShift;
        /**
     * Returns the segment that should be used for key with given hash
     * @param hash the hash code for the key
     * @return the segment
     */
    final Segment<K,V> segmentFor(int hash) {
        return segments[(hash >>> segmentShift) & segmentMask];
    }

segmentFor方法傳入key的hash值,返回該key對應的Segment
段陣列的定義如下:

    /**
     * The segments, each of which is a specialized hash table
     */
    final Segment<K,V>[] segments;

將大的hash表拆分成小的hash表,每一個小的hash表為一個Segment段,每一個Segment段相當於一個同步的Hashtable,每個段加一把鎖,不同的段加的是不同的鎖,降低了鎖的粒度,提高了鎖的效率。

Segment段的主要工作

先貼出Segment的原文註釋,然後給出簡單翻譯

     /**
     * Segments are specialized versions of hash tables.  This
     * subclasses from ReentrantLock opportunistically, just to
     * simplify some locking and avoid separate construction.
     */
    static final class Segment<K,V> extends ReentrantLock implements Serializable {
        /*
         * Segments maintain a table of entry lists that are ALWAYS
         * kept in a consistent state, so can be read without locking.
         * Next fields of nodes are immutable (final).  All list
         * additions are performed at the front of each bin. This
         * makes it easy to check changes, and also fast to traverse.
         * When nodes would otherwise be changed, new nodes are
         * created to replace them. This works well for hash tables
         * since the bin lists tend to be short. (The average length
         * is less than two for the default load factor threshold.)
         *
         * Read operations can thus proceed without locking, but rely
         * on selected uses of volatiles to ensure that completed
         * write operations performed by other threads are
         * noticed. For most purposes, the "count" field, tracking the
         * number of elements, serves as that volatile variable
         * ensuring visibility.  This is convenient because this field
         * needs to be read in many read operations anyway:
         *
         *   - All (unsynchronized) read operations must first read the
         *     "count" field, and should not look at table entries if
         *     it is 0.
         *
         *   - All (synchronized) write operations should write to
         *     the "count" field after structurally changing any bin.
         *     The operations must not take any action that could even
         *     momentarily cause a concurrent read operation to see
         *     inconsistent data. This is made easier by the nature of
         *     the read operations in Map. For example, no operation
         *     can reveal that the table has grown but the threshold
         *     has not yet been updated, so there are no atomicity
         *     requirements for this with respect to reads.
         *
         * As a guide, all critical volatile reads and writes to the
         * count field are marked in code comments.
         */
         ...
       }

段Segment是一個特殊的Hashtable,為了方便加鎖操作,繼承自ReentrantLock。
段維護了一個始終保持一致狀態的entry連結串列,保證讀操作不需要加鎖。
next屬性的作用是指向下一個entry,通過使用final修飾使用了不變(immutable)模式,不變模式保證了執行緒安全。next的不變性導致了next的指向在賦值後不能改變,因此entry連結串列新增資料只能在連結串列頭部新增一個新的entry作為新的連結串列頭,其next指向舊的連結串列頭。刪除資料的時候,因為entry的next指向不能改變,所以刪除資料的時候就只能複製刪除節點之前的所有節點並連結刪除節點的下一個節點。
讀操作不加鎖,通過volatile保持可見性,始終讀取最新資料
count屬性表示段中entry數量的大小,volatile修飾保證可見
所有的讀操(非同步)作必須先讀取count,count為0則不能讀取entry連結串列
所有寫操作(同步)如果對entry連結串列做了結構性修改最後必須修改count,保證併發讀的一致性。

         /**
         * The number of elements in this segment's region.
         */
        transient volatile int count;

        /**
         * Number of updates that alter the size of the table. This is
         * used during bulk-read methods to make sure they see a
         * consistent snapshot: If modCounts change during a traversal
         * of segments computing size or checking containsValue, then
         * we might have an inconsistent view of state so (usually)
         * must retry.
         */
        transient int modCount;

count屬性表示段中元素的數量
modCount表示更改表大小的操作次數,modCount屬性方便了併發讀操作讀取到一個一致的快照,如果modCount在需要遍歷段的操作中(size方法計算所有段內元素數量的累加和與containsValue方法計算ConcurrenrHashMap是否包含某一value)被修改,表示看到了不一致的狀態,需要重試

HashEntry資料結構

     /**
     * Because the value field is volatile, not final, it is legal wrt
     * the Java Memory Model for an unsynchronized reader to see null
     * instead of initial value when read via a data race.  Although a
     * reordering leading to this is not likely to ever actually
     * occur, the Segment.readValueUnderLock method is used as a
     * backup in case a null (pre-initialized) value is ever seen in
     * an unsynchronized access method.
     */
    static final class HashEntry<K,V> {
        final K key;
        final int hash;
        volatile V value;
        final HashEntry<K,V> next;
        ...
    }

除了value屬性不是final的,其他三個屬性key、hash和next都是final的,通過建構函式賦值後就不能再修改
因為value是一個非final的volatile屬性,對於java記憶體模型來說,有可能讀到null而不是初始值,雖然發生這種情況的概率比較低,但是一旦發生讀取到的value是null,應該呼叫Segment.readValueUnderLock方法重新讀取一次。

新增資料put操作

    public V put(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).put(key, hash, value, false);
    }

首先檢查新增的value是否為null,是的話丟擲空指標異常,所以ConcurrentHashMap並不像HashMap那樣可以儲存null的value
檢查了value引數後,根據key的hash值定位到對應的段,呼叫段Segment的put方法,因此ConcurrentHashMap的put方法實際上是委託給了Segment段的put方法

        V put(K key, int hash, V value, boolean onlyIfAbsent) {
            lock();
            try {
                int c = count;
                if (c++ > threshold) // ensure capacity
                    rehash();
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1);
                HashEntry<K,V> first = tab[index];
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;

                V oldValue;
                if (e != null) {
                    oldValue = e.value;
                    if (!onlyIfAbsent)
                        e.value = value;
                }
                else {
                    oldValue = null;
                    ++modCount;
                    tab[index] = new HashEntry<K,V>(key, hash, first, value);
                    count = c; // write-volatile
                }
                return oldValue;
            } finally {
                unlock();
            }
        }

Segment段的put操作是需要加鎖的,注意lock的規範用法,加鎖後再try快中進行加鎖後的操縱,在finally塊中釋放鎖。
新增資料操作可能導致資料量超過threshhold,超過的話就進行rehash操作。
然後計算該key需要放置在哪一條entry連結串列,獲取該連結串列頭

    int index = hash & (tab.length - 1);
    HashEntry<K,V> first = tab[index];

接著遍歷連結串列,找到連結串列中對應key的entry,分連結串列中存在該key的entry和不存在兩種情況討論
如果存在的話,直接替換舊值即可,連結串列結構沒有結構性修改
不存在的話,需要把新增的資料放在連結串列的頭部,因為連結串列節點的數量增加了1,連結串列的結構性也被修改,因此需要修改count和modCount

     ++modCount;
     tab[index] = new HashEntry<K,V>(key, hash, first, value);
     count = c; // write-volatile

獲取資料get操作

    public V get(Object key) {
        int hash = hash(key.hashCode());
        return segmentFor(hash).get(key, hash);
    }

get操作也是同樣的直接委派給了Segment段的get方法

        V get(Object key, int hash) {
            if (count != 0) { // read-volatile
                HashEntry<K,V> e = getFirst(hash);
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key)) {
                        V v = e.value;
                        if (v != null)
                            return v;
                        return readValueUnderLock(e); // recheck
                    }
                    e = e.next;
                }
            }
            return null;
        }

讀操作不加鎖,但是要先讀取count,count為0直接返回null
否則通過getFirst方法獲取對應的連結串列頭

        HashEntry<K,V> getFirst(int hash) {
            HashEntry<K,V>[] tab = table;
            return tab[hash & (tab.length - 1)];
        }

找到連結串列頭後就開始遍歷連結串列,遍歷的過程中找到對應key的entry後,獲取此entry的value,此value不為null直接返回此value即可。但是根據前邊的分析,value是非final的volatile的資料,java記憶體模型允許讀取到null值而不是初始值(指令重排序導致),但是value不可能是null值,put操作的時候已經檢查過入口引數的空指標,所以讀取到null的value是很罕見的錯誤的狀態,需要重新再加鎖的情況下讀取一次

        /**
         * Reads value field of an entry under lock. Called if value
         * field ever appears to be null. This is possible only if a
         * compiler happens to reorder a HashEntry initialization with
         * its table assignment, which is legal under memory model
         * but is not known to ever occur.
         */
        V readValueUnderLock(HashEntry<K,V> e) {
            lock();
            try {
                return e.value;
            } finally {
                unlock();
            }
        }

刪除資料remove操作

    public V remove(Object key) {
    int hash = hash(key.hashCode());
        return segmentFor(hash).remove(key, hash, null);
    }

remove操作也是直接委託給Segment段的remove方法

         /**
         * Remove; match on key only if value null, else match both.
         */
        V remove(Object key, int hash, Object value) {
            lock();
            try {
                int c = count - 1;
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1);
                HashEntry<K,V> first = tab[index];
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;

                V oldValue = null;
                if (e != null) {
                    V v = e.value;
                    if (value == null || value.equals(v)) {
                        oldValue = v;
                        // All entries following removed node can stay
                        // in list, but all preceding ones need to be
                        // cloned.
                        ++modCount;
                        HashEntry<K,V> newFirst = e.next;
                        for (HashEntry<K,V> p = first; p != e; p = p.next)
                            newFirst = new HashEntry<K,V>(p.key, p.hash,
                                                          newFirst, p.value);
                        tab[index] = newFirst;
                        count = c; // write-volatile
                    }
                }
                return oldValue;
            } finally {
                unlock();
            }
        }

刪除資料需要加鎖,在try塊中進行加鎖後的操作,在finally中釋放鎖
計算待刪除的entry處於哪一個entry連結串列,並獲取該連結串列頭部

    HashEntry<K,V>[] tab = table;
    int index = hash & (tab.length - 1);
    HashEntry<K,V> first = tab[index];
    HashEntry<K,V> e = first;

找到連結串列後開始遍歷連結串列,如果沒找到對應key的entry直接返回null
找到的話就把就把待刪除節點前邊的所有entry賦值一遍,並連結到待刪除節點的下一個節點,然後設定新的連結串列頭

    for (HashEntry<K,V> p = first; p != e; p = p.next)
        newFirst = new HashEntry<K,V>(p.key, p.hash, newFirst,p.value);
    tab[index] = newFirst;

可見覆制待刪除節點前的所有entry的時候,是先構造複製的first節點,然後構造賦值first節點的後繼節點,稱為second節點吧,second節點構造的時候next指標指向的是first節點,依次類推。這樣複製後的entry和之前的entry順序剛好相反。
刪除節點修改了連結串列的結構和節點數量,因此要回寫count和modCount

跨段操作,獲取ConcurrentHashMap鍵值對數量size方法

    public int size() {
        final Segment<K,V>[] segments = this.segments;
        long sum = 0;
        long check = 0;
        int[] mc = new int[segments.length];
        // Try a few times to get accurate count. On failure due to
        // continuous async changes in table, resort to locking.
        for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
            check = 0;
            sum = 0;
            int mcsum = 0;
            for (int i = 0; i < segments.length; ++i) {
                sum += segments[i].count;
                mcsum += mc[i] = segments[i].modCount;
            }
            if (mcsum != 0) {
                for (int i = 0; i < segments.length; ++i) {
                    check += segments[i].count;
                    if (mc[i] != segments[i].modCount) {
                        check = -1; // force retry
                        break;
                    }
                }
            }
            if (check == sum)
                break;
        }
        if (check != sum) { // Resort to locking all segments
            sum = 0;
            for (int i = 0; i < segments.length; ++i)
                segments[i].lock();
            for (int i = 0; i < segments.length; ++i)
                sum += segments[i].count;
            for (int i = 0; i < segments.length; ++i)
                segments[i].unlock();
        }
        if (sum > Integer.MAX_VALUE)
            return Integer.MAX_VALUE;
        else
            return (int)sum;
    }

size方法主要思路是先在沒有鎖的情況下對所有段大小求和,如果不能成功(這是因為遍歷過程中可能有其它執行緒正在對已經遍歷過的段進行結構性更新),最多執行RETRIES_BEFORE_LOCK次,如果還不成功就在持有所有段鎖的情況下再對所有段大小求和。在沒有鎖的情況下主要是利用Segment中的modCount進行檢測,在遍歷過程中儲存每個Segment的modCount,遍歷完成之後再檢測每個Segment的modCount有沒有改變,如果有改變表示有其它執行緒正在對Segment進行結構性併發更新,需要重新計算。

相關文章