HashMap原始碼：聊聊Map的遍歷效能問題（一）

李帆1998發表於2021-06-08

原文網址 : https://www.cnblogs.com/lifan1998/p/14864021.html

引言

今天刷題時遇到了一個很奇怪的問題，我們知道java HashMap的擴容是有成本的，為了減少擴容的次數和成本，可以給HashMap設定初始容量大小，如下所示：

HashMap<String, Integer> map0 = new HashMap<String, Integer>(100000);

但是在實際使用的過程中，發現效能不但沒有提升，反而顯著下降了！程式碼裡對HashMap的操作也只有遍歷了，看來是遍歷出了問題，於是做了一番測試，得到如下結果：

HashMap的迭代器遍歷效能與 initial capacity 有關

迭代器測試

貼上測試程式碼：

public class MapForEachTest {

    public static void main(String[] args) {
        HashMap<String, Integer> map0 = new HashMap<String, Integer>(100000);

        initDataAndPrint(map0);

        HashMap<String, Integer> map1 = new HashMap<String, Integer>();

        initDataAndPrint(map1);

    }



    private static void initDataAndPrint(HashMap map) {

        initData(map);

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " 耗時: " + (end - start) + " ms");
    }

    private static void forEach(HashMap map) {
        for (Iterator<Map.Entry<String, Integer>> it = map.entrySet().iterator(); it.hasNext();){
            Map.Entry<String, Integer> item = it.next();
            System.out.print(item.getKey());
            // do something
        }

    }

    private static void initData(HashMap map) {
        map.put("a", 0);
        map.put("b", 1);
        map.put("c", 2);
        map.put("d", 3);
        map.put("e", 4);
        map.put("f", 5);
    }

}

這是執行結果

我們將第一個Map初始化10w大小，第二個map不指定大小(實際16)，兩個儲存相同的資料，但是用迭代器遍歷100次的時候發現效能迥異，一個36ms一個4ms，實際上效能差距更大，這裡的4ms是600次System.out.print的耗時，這裡將print注掉再試下

for (Iterator<Map.Entry<String, Integer>> it = map.entrySet().iterator(); it.hasNext();){
    Map.Entry<String, Integer> item = it.next();
    // System.out.print(item.getKey());
    // do something
}

輸出結果如下：

可以發現第二個map耗時幾乎為0，第一個達到了28ms，遍歷期間沒有進行任何操作，既然石錘了和 initial capacity 有關，下一步我們去看看為什麼會這樣，找找Map迭代器的原始碼看看。

迭代器原始碼探究

我們來看看Map.entrySet().iterator()的原始碼；

public final Iterator<Map.Entry<K,V>> iterator() {
    return new EntryIterator();
}

其中EntryIterator是HashMap的內部抽象類，原始碼並不多，我全部貼上來並附上中文註釋


abstract class HashIterator {
    // 下一個Node
    Node<K,V> next; // next entry to return
    // 當前Node
    Node<K,V> current;     // current entry
    // 預期的Map大小，也就是說每個HashMap可以有多個迭代器(每次呼叫 iterator() 會new 一個迭代器出來)，但是隻能有一個迭代器對他remove，否則會直接報錯(快速失敗)
    int expectedModCount;  // for fast-fail
    
    // 當前節點所在的陣列下標，HashMap內部是使用陣列來儲存資料的，不瞭解的先去看看HashMap的原始碼吧
    int index;             // current slot

    HashIterator() {
        // 初始化 expectedModCount
        expectedModCount = modCount;
        // 淺拷貝一份Map的資料
        Node<K,V>[] t = table;
        current = next = null;
        index = 0;
        // 如果 Map 中資料不為空，遍歷陣列找到第一個實際儲存的素，賦值給next
        if (t != null && size > 0) { // advance to first entry
            do {} while (index < t.length && (next = t[index++]) == null);
        }
    }

    public final boolean hasNext() {
        return next != null;
    }

    final Node<K,V> nextNode() {
        // 用來淺拷貝table，和別名的作用差不多，沒啥用
        Node<K,V>[] t;
        // 定義一個e指儲存next，並在找到下一值時返它自己
        Node<K,V> e = next;
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        if (e == null)
            throw new NoSuchElementException();
            
        // 使current指向e，也就是next，這次要找的值，並且讓next = current.next，一般為null
        if ((next = (current = e).next) == null && (t = table) != null) {
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }

    /**
     * 刪除元素，這裡不講了，調的是HashMap的removeNode，沒啥特別的
     **/
    public final void remove() {
        Node<K,V> p = current;
        if (p == null)
            throw new IllegalStateException();
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        current = null;
        K key = p.key;
        removeNode(hash(key), key, null, false, false);
        // 用來保證快速失敗的
        expectedModCount = modCount;
    }
}

上面的程式碼一看就明白了，迭代器每次尋找下一個元素都會去遍歷陣列，如果 initial capacity 特別大的話，也就是說 threshold 也大，table.length就大，所以遍歷比較耗效能。

table陣列的大小設定是在resize()方法裡：

Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;

其他遍歷方法

注意程式碼裡我們用的是Map.entrySet().iterator()，實際上和keys().iterator(), values().iterator() 一樣，原始碼如下：

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

final class ValueIterator extends HashIterator
    implements Iterator<V> {
    public final V next() { return nextNode().value; }
}

final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K,V>> {
    public final Map.Entry<K,V> next() { return nextNode(); }
}

這兩個就不分析了，效能一樣。

實際使用中對集合的遍歷還有幾種方法：

普通for迴圈+下標
增強型for迴圈
Map.forEach
Stream.forEach

普通for迴圈+下標的方法不適用於Map，這裡不討論了。

增強型for迴圈

增強行for迴圈實際上是通過迭代器來實現的，我們來看兩者的聯絡

原始碼：

private static void forEach(HashMap map) {
    for (Iterator<Map.Entry<String, Integer>> it = map.entrySet().iterator(); it.hasNext();){
        Map.Entry<String, Integer> item = it.next();
        System.out.print(item.getKey());
        // do something
    }
}


private static void forEach0(HashMap<String, Integer> map) {
    for (Map.Entry entry : map.entrySet()) {
        System.out.print(entry.getKey());
    }
}

編譯後的位元組碼：

// access flags 0xA
  private static forEach(Ljava/util/HashMap;)V
   L0
    LINENUMBER 41 L0
    ALOAD 0
    INVOKEVIRTUAL java/util/HashMap.entrySet ()Ljava/util/Set;
    INVOKEINTERFACE java/util/Set.iterator ()Ljava/util/Iterator; (itf)
    ASTORE 1
   L1
   FRAME APPEND [java/util/Iterator]
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.hasNext ()Z (itf)
    IFEQ L2
   L3
    LINENUMBER 42 L3
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object; (itf)
    CHECKCAST java/util/Map$Entry
    ASTORE 2
   L4
    LINENUMBER 43 L4
    GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
    ALOAD 2
    INVOKEINTERFACE java/util/Map$Entry.getKey ()Ljava/lang/Object; (itf)
    CHECKCAST java/lang/String
    INVOKEVIRTUAL java/io/PrintStream.print (Ljava/lang/String;)V
   L5
    LINENUMBER 45 L5
    GOTO L1
   L2
    LINENUMBER 46 L2
   FRAME CHOP 1
    RETURN
   L6
    LOCALVARIABLE item Ljava/util/Map$Entry; L4 L5 2
    // signature Ljava/util/Map$Entry<Ljava/lang/String;Ljava/lang/Integer;>;
    // declaration: item extends java.util.Map$Entry<java.lang.String, java.lang.Integer>
    LOCALVARIABLE it Ljava/util/Iterator; L1 L2 1
    // signature Ljava/util/Iterator<Ljava/util/Map$Entry<Ljava/lang/String;Ljava/lang/Integer;>;>;
    // declaration: it extends java.util.Iterator<java.util.Map$Entry<java.lang.String, java.lang.Integer>>
    LOCALVARIABLE map Ljava/util/HashMap; L0 L6 0
    MAXSTACK = 2
    MAXLOCALS = 3

  // access flags 0xA
  // signature (Ljava/util/HashMap<Ljava/lang/String;Ljava/lang/Integer;>;)V
  // declaration: void forEach0(java.util.HashMap<java.lang.String, java.lang.Integer>)
  private static forEach0(Ljava/util/HashMap;)V
   L0
    LINENUMBER 50 L0
    ALOAD 0
    INVOKEVIRTUAL java/util/HashMap.entrySet ()Ljava/util/Set;
    INVOKEINTERFACE java/util/Set.iterator ()Ljava/util/Iterator; (itf)
    ASTORE 1
   L1
   FRAME APPEND [java/util/Iterator]
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.hasNext ()Z (itf)
    IFEQ L2
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object; (itf)
    CHECKCAST java/util/Map$Entry
    ASTORE 2
   L3
    LINENUMBER 51 L3
    GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
    ALOAD 2
    INVOKEINTERFACE java/util/Map$Entry.getKey ()Ljava/lang/Object; (itf)
    INVOKEVIRTUAL java/io/PrintStream.print (Ljava/lang/Object;)V
   L4
    LINENUMBER 52 L4
    GOTO L1
   L2
    LINENUMBER 53 L2
   FRAME CHOP 1
    RETURN
   L5
    LOCALVARIABLE entry Ljava/util/Map$Entry; L3 L4 2
    LOCALVARIABLE map Ljava/util/HashMap; L0 L5 0
    // signature Ljava/util/HashMap<Ljava/lang/String;Ljava/lang/Integer;>;
    // declaration: map extends java.util.HashMap<java.lang.String, java.lang.Integer>
    MAXSTACK = 2
    MAXLOCALS = 3

都不用耐心觀察，兩個方法的位元組碼除了區域性變數不一樣其他都幾乎一樣，由此可以得出增強型for迴圈效能與迭代器一樣，實際執行結果也一樣，我不展示了，感興趣的自己去copy文章開頭和結尾的程式碼試下。

還是貼上吧

Map.forEach

先說一下為什麼不把各種方法一起執行同時列印效能，這是因為CPU快取的原因和JVM的一些優化會干擾到效能的判斷，附錄全部測試結果有說明

直接來看原始碼吧

@Override
public void forEach(BiConsumer<? super K, ? super V> action) {
    Node<K,V>[] tab;
    if (action == null)
        throw new NullPointerException();
    if (size > 0 && (tab = table) != null) {
        int mc = modCount;
        for (int i = 0; i < tab.length; ++i) {
            for (Node<K,V> e = tab[i]; e != null; e = e.next)
                action.accept(e.key, e.value);
        }
        if (modCount != mc)
            throw new ConcurrentModificationException();
    }
}

很簡短的原始碼，就不打註釋了，從原始碼我們不難獲取到以下資訊：

該方法也是快速失敗的，遍歷期間不能刪除元素
需要遍歷整個陣列
BiConsumer加了@FunctionalInterface註解，用了 lambda

第三點和效能無關，這裡只是提下

通過以上資訊我們能確定這個效能與table陣列的大小有關。

但是在實際測試的時候卻發現效能比迭代器差了不少：

其中詳細原因等我下期的文章吧，這裡不講了

Stream.forEach

Stream與Map.forEach的共同點是都使用了lambda表示式。但兩者的原始碼沒有任何複用的地方。

不知道你有沒有看累，先上測試結果吧：

耗時比Map.foreach還要高點。

下面講講Straam.foreach順序流的原始碼，這個也不復雜，不過累的話先去看看總結吧。

Stream.foreach的執行者是分流器，HashMap的分流器原始碼就在HashMap類中，是一個靜態內部類，類名叫 EntrySpliterator

下面是順序流執行的方法

public void forEachRemaining(Consumer<? super Map.Entry<K,V>> action) {
    int i, hi, mc;
    if (action == null)
        throw new NullPointerException();
    HashMap<K,V> m = map;
    Node<K,V>[] tab = m.table;
    if ((hi = fence) < 0) {
        mc = expectedModCount = m.modCount;
        hi = fence = (tab == null) ? 0 : tab.length;
    }
    else
        mc = expectedModCount;
    if (tab != null && tab.length >= hi &&
        (i = index) >= 0 && (i < (index = hi) || current != null)) {
        Node<K,V> p = current;
        current = null;
        do {
            if (p == null)
                p = tab[i++];
            else {
                action.accept(p);
                p = p.next;
            }
        } while (p != null || i < hi);
        if (m.modCount != mc)
            throw new ConcurrentModificationException();
    }
}

從以上原始碼中我們也可以輕易得出遍歷需要順序掃描所有陣列

總結

至此，Map的四種遍歷方法都測試完了，我們可以簡單得出兩個結論

Map的遍歷效能與內部table陣列大小有關，也就是說與常用引數 initial capacity 有關，不管哪種遍歷方式都是的
效能（由高到低）：迭代器 == 增強型For迴圈 > Map.forEach > Stream.foreach

這裡就不說什麼多少倍多少倍的效能差距了，拋開資料集大小都是扯淡，當我們不指定initial capacity的時候，四種遍歷方法耗時都是3ms，這3ms還是輸入輸出流的耗時，實際遍歷耗時都是0，所以資料集不大的時候用哪種都無所謂，就像不加輸入輸出流耗時不到1ms一樣，很多時候效能消耗是在遍歷中的業務操作，這篇文章不是為了讓你去優化程式碼把foreach改成迭代器的，在大多數場景下並不需要關注迭代本身的效能，Stream與Lambda帶來的可讀性提升更加重要。

所以此文的目的就當是知識擴充吧，除了以上說到的遍歷效能問題，你還應該從中能獲取到的知識點有：

HashMap的陣列是儲存在table陣列裡的
table陣列是resize方法初始化的，new Map不會初始化陣列
Map遍歷是table陣列從下標0遞增排序的，所以他是無序的
keySet().iterator，values.iterator， entrySet.iterator 來說沒有本質區別，用的都是同一個迭代器
各種遍歷方法裡，只有迭代器可以remove，雖然增強型for迴圈底層也是迭代器，但這個語法糖隱藏了 remove 方法
每次呼叫迭代器方法都會new 一個迭代器，但是隻有一個可以修改
Map.forEach與Stream.forEach看上去一樣，實際實現是不一樣的

附：四種遍歷原始碼

private static void forEach(HashMap map) {
    for (Iterator<Map.Entry<String, Integer>> it = map.entrySet().iterator(); it.hasNext();){
        Map.Entry<String, Integer> item = it.next();
        // System.out.print(item.getKey());
        // do something
    }
}


private static void forEach0(HashMap<String, Integer> map) {
    for (Map.Entry entry : map.entrySet()) {
        System.out.print(entry.getKey());
    }
}

private static void forEach1(HashMap<String, Integer> map) {
    map.forEach((key, value) -> {
        System.out.print(key);
    });

}

private static void forEach2(HashMap<String, Integer> map) {
    map.entrySet().stream().forEach(e -> {
        System.out.print(e.getKey());
    });

}

附：完整測試類與測試結果+一個奇怪的問題

程式碼很醜，不要介意


public class MapForEachTest {

    public static void main(String[] args) {
        HashMap<String, Integer> map0 = new HashMap<String, Integer>(100000);
        HashMap<String, Integer> map1 = new HashMap<String, Integer>();
        initData(map0);
        initData(map1);

        
        testIterator(map0);
        testIterator(map1);
        testFor(map0);
        testFor(map1);
        testMapForeach(map0);
        testMapForeach(map1);
        testMapStreamForeach(map0);
        testMapStreamForeach(map1);

    }



    private static void testIterator(HashMap map) {

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " 迭代器 耗時: " + (end - start) + " ms");
    }

    private static void testFor(HashMap map) {

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach0(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " 增強型For 耗時: " + (end - start) + " ms");
    }

    private static void testMapForeach(HashMap map) {

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach1(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " MapForeach 耗時: " + (end - start) + " ms");
    }


    private static void testMapStreamForeach(HashMap map) {

        long start = System.currentTimeMillis();

        for (int i = 0; i < 100; i++) {
            forEach2(map);
        }
        long end = System.currentTimeMillis();
        System.out.println("");
        System.out.println("HashMap Size: " + map.size() +  " MapStreamForeach 耗時: " + (end - start) + " ms");
    }

    private static void forEach(HashMap map) {
        for (Iterator<Map.Entry<String, Integer>> it = map.entrySet().iterator(); it.hasNext();){
            Map.Entry<String, Integer> item = it.next();
            System.out.print(item.getKey());
            // do something
        }
    }


    private static void forEach0(HashMap<String, Integer> map) {
        for (Map.Entry entry : map.entrySet()) {
            System.out.print(entry.getKey());
        }
    }

    private static void forEach1(HashMap<String, Integer> map) {
        map.forEach((key, value) -> {
            System.out.print(key);
        });

    }

    private static void forEach2(HashMap<String, Integer> map) {
        map.entrySet().stream().forEach(e -> {
            System.out.print(e.getKey());
        });

    }

    private static void initData(HashMap map) {
        map.put("a", 0);
        map.put("b", 1);
        map.put("c", 2);
        map.put("d", 3);
        map.put("e", 4);
        map.put("f", 5);
    }

}

測試結果：

如果你認真看了上面的文章的話，會發現測試結果有個不對勁的地方：

MapStreamForeach的耗時似乎變少了

我可以告訴你這不是資料的原因，從我的測試測試結果來看，直接原因是因為先執行了 Map.foreach，如果你把 MapForeach 和 MapStreamForeach 調換一下執行順序，你會發現後執行的那個耗時更少。至於這個問題的根本的原因，你有興趣可以自己探索下，或者等我之後的文章

如何遍歷 HashMap，遍歷HashMap 的 5 種最佳方式
2020-10-18
HashMap
HashMap 的 7 種遍歷方式與效能分析
2022-02-11
HashMap
js的map遍歷和array遍歷
2018-11-15
JS
ArrayList和hashMap的遍歷
2018-03-09
HashMap
影片直播系統原始碼，java中Map遍歷的三種方式
2023-02-07
原始碼Java
如何遍歷HashMap集合？
2023-04-23
HashMap
hashMap的四種遍歷方式
2021-02-13
HashMap
Map迴圈遍歷
2018-08-13
vue遍歷map物件
2020-10-26
Vue物件
MVC遍歷map集合
2020-10-28
MVC
hashmap遍歷時用map.remove方法為什麼會報錯？
2019-04-03
HashMapREM
map的四種遍歷方式
2018-05-05
Java遍歷Map集合的方法
2024-05-29
Java
Map集合&&Map集合的不同遍歷【keySet()&&entrySet()】
2020-11-07
Java HashMap和Go map原始碼對比
2018-12-01
JavaHashMapGo原始碼
26_map遍歷.go
2021-12-24
Go
jquery遍歷得到的 Map 資料，
2019-01-08
jQuery
Map集合的四種遍歷方式
2018-04-03
週末我把HashMap原始碼又過了一遍
2020-11-16
HashMap原始碼
【JavaSE】Map集合，HashMap的常用方法put、get的原始碼解析
2019-02-20
JavaHashMap原始碼
map遍歷知識總結
2024-03-17
關於Map集合的遍歷總結
2020-04-04
Java遍歷Map物件的四種方式
2018-12-29
Java物件
java中遍歷map的集中方法
2018-03-18
Java
java List＜HashMap＜String,Object＞＞遍歷修改
2024-03-09
JavaHashMapObject
前端面試題_06_parseInt與map遍歷組合題
2020-11-28
前端面試題
HashSet 新增/遍歷元素原始碼分析
2022-07-08
原始碼
golang遍歷channel時return問題
2019-01-15
Golang
【Java中遍歷Map物件的4種方法】
2020-11-23
Java物件
Java之HashMap集合簡介及遍歷
2018-07-30
JavaHashMap
原始碼|jdk原始碼之HashMap分析(一)
2019-01-19
原始碼JDKHashMap
【筆記】jQuery原始碼（節點遍歷）
2018-03-25
筆記jQuery原始碼
二叉樹的遍歷演算法【和森林的遍歷】【PHP 原始碼測試】
2019-07-04
二叉樹演算法PHP原始碼
js技巧用Map集合代替Array遍歷
2018-11-07
JS
如何高效的遍歷Map？你常用的不一定是最快的
2021-04-29
遍歷陣列的常用方法forEach，filter，map等
2019-04-11
陣列Filter
Java中如何遍歷Map物件的4種方法
2021-10-03
Java物件
JDk1.7 HashMap原始碼解析——執行緒安全問題
2020-12-31
JDKHashMap原始碼執行緒