Java效能測試利器：JMH入門與實踐｜得物技術

在軟體開發中，效能測試是不可或缺的一環。但是編寫基準測試來正確衡量大型應用程式的一小部分的效能卻又非常困難。當基準測試單獨執行元件時，JVM或底層硬體可能會對您的元件應用許多最佳化。當元件作為大型應用程式的一部分執行時，這些最佳化可能無法應用。因此，實施不當的微基準測試可能會讓您相信元件的效能比實際情況更好。編寫正確的Java微基準測試通常需要防止JVM和硬體在微基準測試執行期間應用的最佳化，而這些最佳化在實際生產系統中是無法應用的。這就是JMH（Java 微基準測試工具）可以幫助您實現的功能。這篇文章我會全面給大家介紹下JMH的各個方面。

一、JMH概述

JMH是一個用於微基準測試的Java庫，它允許開發者對程式碼的熱點進行精確的效能測試。JMH由OpenJDK團隊開發，是Java效能測試領域的事實標準。

JMH的主要特點:

高精度：支援納秒級別的效能測試。

易用性：透過註解配置測試，無需複雜的測試環境搭建。

多模式測試：支援多種測試模式，如吞吐量、平均時間等。

多維度測試：可以測試程式碼在不同條件下的效能表現。

JMH與其他效能測試工具的比較

與JVM其他效能測試工具相比，JMH提供了更細粒度的控制和更高的測試精度。

二、快速開始

原型方式生成Maven專案

使用JMH的最簡單方法是使用Maven原型生成一個新的JMH專案。Maven會生成一個新的Java專案，其中包含一個Java示例類和一個pom.xml檔案。pom.xml檔案包含編譯和構建JMH微基準測試Java示例類所需的Maven依賴。

以下是生成JMH專案模板所需的Maven命令列：

mvn archetype:generate
          -DinteractiveMode=false
          -DarchetypeGroupId=org.openjdk.jmh
          -DarchetypeArtifactId=jmh-java-benchmark-archetype
          -DgroupId=com.dewu
          -DartifactId=first-benchmark
          -Dversion=1.0

這個命令列將建立一個名為first-benchmark（Maven 命令中指定的artifactId）的新目錄。這個目錄下將生成一個新的Maven源目錄結構（src/main/java）。java源根目錄中將生成一個名為com.dewu的包。包內是一個名為MyBenchmark的JMH基準測試類。

已有專案配置JMH

如果是已有專案，你可以在Maven專案中新增以下依賴：

xml
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-core</artifactId>
    <version>1.33</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-generator-annprocess</artifactId>
    <version>1.33</version>
    <scope>test</scope>
</dependency>

然後再編寫你的第一個JMH基準測試類，下面是我寫的一個示例：

package com.dewu; 
import org.openjdk.jmh.annotations.Benchmark; 
public class MyBenchmark { 
    @Benchmark 
    public void testMethod() { 
        // 這是用於構建 JMH 基準的演示/示例模板。根據需要進行編輯。
        // 在此處放置基準程式碼。
    } 
}

你可以把要測量的程式碼放在testMethod()方法體裡面。下面是一個例子：

package com.dewu; 
import org.openjdk.jmh.annotations.Benchmark; 
public class MyBenchmark { 
    @Benchmark 
    public void testMethod() { 
        // 這是用於構建 JMH 基準的演示/示例模板。根據需要進行編輯。
        // 在此處放置基準程式碼。
        int a = 1; 
        int b = 2; 
        int sum = a + b; 
    } 
}

注意：這個特定示例是一個糟糕的基準測試實現，因為 JVM 檢測到sum變數從未使用過，因此可能會消除這段總和計算的程式碼。我將在本教程的後面部分介紹如何使用 JMH 正確的實現基準測試來避免JVM的死程式碼消除。

三、JMH的核心概念和註解

基準測試方法

使用@Benchmark註解標記需要跑基準測試的方法。

測試模式（Benchmark Mode）

測試模式使用@BenchmarkMode註解標記，主要包含以下幾種模式：

Throughput：吞吐量，單位時間內可以完成的運算元。

AverageTime：平均時間，完成一次操作所需的平均時間。

SampleTime：基於取樣的執行時間，提供統計分佈資料。

SingleShotTime：單次執行時間，用於測試冷啟動效能。

ALL：執行所有模式。

狀態（State）

使用@State註解定義，表示測試狀態的生命週期和作用域。

Scope.Thread：每個執行緒一個例項。

Scope.Benchmark：所有執行緒共享一個例項。

Scope.Group：每個執行緒組共享一個例項。

預熱（Warmup）：

使用@Warmup註解配置，預熱是正式測試前的準備階段，用於“熱身”JVM，減少JIT編譯的影響。

測量（Measurement）：

使用@Measurement註解配置，指定正式測試的迭代次數和每次迭代的執行時間。

輸出時間單位（Output Time Unit）：

使用@OutputTimeUnit註解指定測試結果的時間單位。

多執行緒（Threads）：

使用@Threads註解指定測試方法執行的執行緒數。

引數化（Params）：

使用@Param註解為基準測試方法提供引數，允許在單個測試中執行多個引數集。

隔離（Fork）：

使用@Fork註解指定測試執行在不同的JVM程序中進行，以避免測試間的相互影響。通常設定為1。

輔助計數器（AuxCounters）：

使用@AuxCounters註解提供額外的效能計數器。

控制編譯器最佳化（CompilerControl）：

使用@CompilerControl註解控制JVM的編譯最佳化行為。

Blackhole：

JMH提供的一個機制，用於“吞噬”測試方法的輸出，防止JVM的死程式碼消除最佳化。

結果分析（Result Analysis）：

JMH生成詳細的測試報告，包括操作的平均時間、吞吐量、誤差範圍等。

API和註解（API and Annotations）：

JMH提供了豐富的API和註解來配置和執行基準測試。

四、JMH的工作原理

JVM對效能測試的影響

JVM的即時編譯器（JIT）會對程式碼進行最佳化，這可能會影響效能測試的結果。JMH透過控制測試環境，確保測試結果的準確性。

JMH如何提供準確的測試結果

JMH透過預熱、多輪迭代、多程序測試等機制，減少JVM最佳化對測試結果的影響。以下是一個使用JMH進行基準測試的示例，它展示了JMH如何透過預熱、多次迭代和避免JVM最佳化來提供準確的測試結果。

首先，確保你的專案中已經新增了JMH的依賴。然後，建立一個基準測試類，我們將測試兩個方法：一個簡單的數學運算和一個更復雜的數學運算，以比較它們的執行時間。

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.TimeUnit;
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class AccuracyBenchmark {
    @Benchmark
    public void measure() {
        // 空執行，用於模擬最佳化掉測試程式碼情景
    }
    @Benchmark
    public void measureSimpleMath(Blackhole blackhole) {
        // 簡單的數學運算，用於模擬輕量級操作
        blackhole.consume(add(1, 2));
    }
    @Benchmark
    public void measureComplexMath(Blackhole blackhole) {
        // 複雜的數學運算，用於模擬重量級操作
        blackhole.consume(calculate(123, 456, 789));
    }
    private int add(int a, int b) {
        return a + b;
    }
    private int calculate(int a, int b, int c) {
        int result = a;
        for (int i = 0; i < b; i++) {
            result += c;
        }
        return result;
    }
    // Blackhole消耗方法，防止JVM最佳化掉測試程式碼
    // 主方法，用於執行基準測試
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(AccuracyBenchmark.class.getSimpleName())
                .build();
        new Runner(opt).run();
    }
}

執行上述程式碼後，JMH會輸出類似以下的測試結果：

# Run complete. Total time: 00:00:33

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                             Mode  Cnt  Score   Error  Units
AccuracyBenchmark.measure             avgt    5  0.293 ± 0.104  ns/op
AccuracyBenchmark.measureComplexMath  avgt    5  2.118 ± 0.622  ns/op
AccuracyBenchmark.measureSimpleMath   avgt    5  2.222 ± 0.539  ns/op

Process finished with exit code 0

在這個例子中，我們使用了以下JMH特性來確保測試結果的準確性：

預熱（Warmup）：透過預熱迭代，我們確保JVM的即時編譯器（JIT）有足夠的時間對程式碼進行最佳化，從而模擬實際執行情況。
多次測量（Measurement）：透過多次測量迭代，我們可以減少偶然誤差並計算統計上顯著的結果。
隔離測試（Fork）：透過在單獨的JVM程序中執行每個基準測試，我們避免了測試之間的相互影響，並確保每個測試都在相同的初始條件下進行。
Blackhole消耗：Blackhole是一個JMH提供的工具，用於消耗測試方法的輸出，防止JVM最佳化掉測試程式碼（例如，死程式碼消除）。

要執行這個基準測試，你可以執行main方法，JMH會輸出每個方法的平均執行時間。這個例子展示了簡單數學運算和複雜數學運算的效能差異。

請注意，這個例子是一個簡單的基準測試，實際使用時可能需要更復雜的測試場景和更多的配置。此外，JMH的輸出應該被解釋為趨勢而不是絕對值，因為效能測試受到很多因素的影響，包括JVM狀態、系統負載等。

五、JMH的高階特性

多執行緒測試和同步

JMH支援多執行緒測試，並提供了同步機制以確保測試的準確性。

在JMH中進行多執行緒測試時，你需要使用@Threads或@Fork註解來指定執行緒數量。為了確保所有執行緒在測量階段同時開始和結束，可以使用@Benchmark註解的syncIterations引數。

以下是一個使用JMH進行多執行緒測試的示例：

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@State(Scope.Thread) // 每個執行緒都有自己的狀態
@BenchmarkMode(Mode.Throughput) // 測試吞吐量
@OutputTimeUnit(TimeUnit.SECONDS) // 時間單位為秒
@Warmup(iterations = 5) // 預熱迭代5次
public class MultiThreadBenchmark {

    @Benchmark
    @Threads(2) // 指定使用2個執行緒執行測試
    public void multiThreadTest() {
        // 這裡是需要測試的多執行緒程式碼
    }

    // 主方法，用於執行基準測試
    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
                .include(MultiThreadBenchmark.class.getSimpleName())
                .syncIterations(true) // 啟用同步迭代
                .build();
        new Runner(opt).run();
    }
}

在這個例子中，我們使用@Threads(2)註解來指定測試方法multiThreadTest將在2個執行緒中並行執行，使用syncIterations(true)確保所有執行緒在每個測量迭代中同步執行。這意味著JMH將建立2個執行緒，每個執行緒都將執行multiThreadTest方法。

如果你想要更細緻地控制每個執行緒的行為，你可以使用@Group和@GroupThreads註解來定義執行緒組：

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.Throughput) // 測試吞吐量
@OutputTimeUnit(TimeUnit.SECONDS) // 時間單位為秒
@Warmup(iterations = 5) // 預熱迭代5次
@State(Scope.Group) // 每個執行緒都有自己的狀態
public class ThreadGroupBenchmark {
    private int counter;

    @Setup(Level.Trial)
    public void setUp() {
        counter = 0;
    }

    @Benchmark
    @GroupThreads(2)
    @Group("testGroup")
    public void increment( ThreadGroupState state) {
        state.increment();
    }

    @Benchmark
    @Group("testGroup")
    public void decrement(ThreadGroupState state) {
        state.decrement();
    }

    @State(Scope.Thread)
    public static class ThreadGroupState {
        private int value;

        public void increment() {
            value++;
        }

        public void decrement() {
            value--;
        }
    }
    // 主方法，用於執行基準測試
    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
                .include(ThreadGroupBenchmark.class.getSimpleName())
                .syncIterations(true) // 啟用同步迭代
                .build();
        new Runner(opt).run();
    }
}

在這個例子中，我們定義了一個執行緒組testGroup，並使用@GroupThreads(2)註解指定每個執行緒組有2個執行緒。increment和decrement方法都屬於testGroup執行緒組，它們將並行執行。ThreadGroupState類定義了執行緒組共享的狀態。

這兩個示例展示瞭如何在JMH中設定和執行多執行緒基準測試。透過這種方式，你可以評估併發程式碼在多執行緒環境中的效能。

引數化測試

在JMH中實現引數化測試，可以使用@Param註解來為基準測試方法提供不同的引數值。這種方式特別適合於測量方法效能與引數取值之間的關係。下面是一個引數化測試的示例：

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class ParametrizedBenchmark {
    @Param({"1", "10", "100"})
    private int numberOfElements;

    private int[] array;

    @Setup
    public void setup() {
        array = new int[numberOfElements];
        for (int i = 0; i < numberOfElements; i++) {
            array[i] = i;
        }
    }

    @Benchmark
    public int sumArray() {
        int sum = 0;
        for (int value : array) {
            sum += value;
        }
        return sum;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(ParametrizedBenchmark.class.getSimpleName())
                .forks(1)
                .warmupIterations(5)
                .measurementIterations(5)
                .build();


        new Runner(opt).run();
    }
}

在這個例子中，@Param註解用於定義引數numberOfElements，它將取三個不同的值：1、10和100。setup方法用於初始化陣列，sumArray方法用於計算陣列元素的總和。JMH將為每個引數值執行基準測試，並生成相應的結果。

這種方式允許你用一個測試方法來覆蓋多種輸入情況下的效能測試，從而更全面地瞭解程式碼的效能表現。

控制JVM的編譯最佳化

@CompilerControl註解是JMH提供的一個高階特性，它允許測試作者精確控制JVM的編譯行為。這在基準測試中非常有用，因為它可以防止JVM最佳化掉測試程式碼，從而確保測試結果的準確性。

以下是如何使用@CompilerControl註解來控制JVM的編譯最佳化的一個示例：

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class CompilerControlExample {

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public static int doNotInlineMe(int x) {
        // 這個方法不會被內聯，即使它是一個簡單的方法
        return x + 42;
    }

    @Benchmark
    @Threads(1) // 單執行緒執行
    public void testWithCompilerControl(Blackhole bh) {
        int result = doNotInlineMe(1);
        bh.consume(result);
    }

    @Benchmark
    @Threads(1) // 單執行緒執行
    public void testWithoutCompilerControl(Blackhole bh) {
        int result = inlineMe(1);
        bh.consume(result);
    }

    // 這是一個可能會被JVM內聯的簡單方法
    public static int inlineMe(int x) {
        return x + 42;
    }

    // 主方法，用於執行基準測試
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(CompilerControlExample.class.getSimpleName())
                .build();
        new Runner(opt).run();
    }

在這個例子中，doNotInlineMe方法用了@CompilerControl(CompilerControl.Mode.DONT_INLINE)註解，這告訴JMH和JVM不要內聯這個方法，即使它是一個簡單的方法。這可以防止JVM的即時編譯器（JIT）在測試過程中最佳化掉這個方法。testWithCompilerControl基準測試方法呼叫了doNotInlineMe方法，並且它的結果被傳遞給了Blackhole，這是一個用來防止編譯器最佳化掉測試程式碼的工具。

另一方面，inlineMe方法是一個可能會被JVM內聯的簡單方法，testWithoutCompilerControl基準測試方法呼叫了這個方法，並且沒有使用@CompilerControl註解。

透過比較這兩個測試方法的結果，你可以觀察到是否內聯對效能測試結果的影響。這種控制對於確保基準測試的準確性非常重要。以下是跑完這個例子JMH輸出的測試結果：

# Run complete. Total time: 00:00:21

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                          Mode  Cnt  Score   Error  Units
CompilerControlExample.testWithCompilerControl     avgt    5  2.916 ± 0.333  ns/op
CompilerControlExample.testWithoutCompilerControl  avgt    5  1.857 ± 0.094  ns/op

六、編寫有效的基準測試

避免常見的效能測試陷阱

現在我們已經瞭解瞭如何使用JMH編寫基準測試，現在是時候討論如何編寫正確的基準測試了。在寫基準測試時，我們很容易陷入幾個陷阱。我將在以下部分討論其中一些陷阱。

一個常見的陷阱是，JVM可能會在基準測試中執行時對您的程式碼進行最佳化，而如果程式碼在您的實際應用程式中執行，則無法應用這些最佳化。此類最佳化將使您的程式碼看起來比實際執行速度更快。

迴圈最佳化

我們很容易將基準測試程式碼放在基準測試方法的迴圈中，以便在每次呼叫基準測試方法時重複多次（以減少基準測試方法呼叫的開銷）。但是，JVM非常擅長最佳化迴圈，因此最終結果可能與預期不同。一般來說，您應該避免在基準測試方法中使用迴圈。而是使用 @OperationsPerInvocation 註解來告訴JMH每次迭代應該執行多少次操作。比如這個基準測試示例：

@Benchmark
@OperationsPerInvocation(1000)
public void measureLoop() {
    for (int i = 0; i < 1000; i++) {
        // ...
    }
}

消除死程式碼

執行效能基準測試時要避免的JVM最佳化之一是消除死程式碼。如果JVM檢測到某些計算的結果從未使用過，JVM可能會認為該計算是死程式碼並將其消除。比如下面這個基準測試示例：

import org.openjdk.jmh.annotations.Benchmark;

public class MyBenchmark {

    @Benchmark
    public void testMethod() {
        int a = 1;
        int b = 2;
        int sum = a + b;
    }
}

JVM 可以檢測到a+b分配給的sum從未使用過。因此，JVM可以完全刪除sum的計算。最後，基準測試中沒有留下任何程式碼。因此，執行此基準測試的結果具有很大的誤導性。基準測試實際上並沒有測量新增兩個變數並將值分配給第三個變數的時間。基準測試根本沒有測量任何程式碼邏輯。

避免消除死程式碼

為了避免消除死程式碼，我們必須確保要測量的程式碼對JVM來說不像死程式碼。有兩種方法可以做到這一點。

從基準測試方法返回程式碼的結果。
將計算出的值傳遞到JMH提供的Blackhole中。

以下是這兩種方法的示例：

基準測試方法的返回值
從JMH基準測試方法返回計算值如下所示：

import org.openjdk.jmh.annotations.Benchmark;

public class MyBenchmark {

    @Benchmark
    public int testMethod() {
        int a = 1;
        int b = 2;
        int sum = a + b;
        return sum;
    }

}

注意testMethod()方法現在會返回sum變數。這樣，JVM就不能直接消除程式碼，因為返回值可能會被呼叫者使用。如果你的基準測試方法正在計算最終可能被視為死程式碼而被消除的多個值，那麼您可以將兩個值組合為一個，然後返回該值（例如，包含兩個值的物件）。

將值傳遞給Blackhole

返回組合值的另一種方法是將計算值傳遞到JMH提供的Blackhole變數中。將值傳遞到Blackhole的方式如下：

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.infra.Blackhole;

public class MyBenchmark {

    @Benchmark
   public void testMethod(Blackhole blackhole) {
        int a = 1;
        int b = 2;
        int sum = a + b;
        blackhole.consume(sum);
    }
}

testMethod()基準測試方法現在將Blackhole物件作為引數。呼叫時，JMH將向測試方法提供該引數。

還要注意變數中計算出的總和sum現在是傳遞給了例項Blackhole的consume()方法。這樣JVM會認為sum變數會被使用。如果您的基準測試方法產生多個結果，您可以將這些結果都傳遞給Blackhole。

常量摺疊

常量摺疊是另一種常見的JVM最佳化。基於常量的計算通常會導致完全相同的結果，無論執行多少次計算。JVM可能會檢測到這一點，並用計算結果替換該計算。

舉個例子，看一下這個基準測試：

import org.openjdk.jmh.annotations.Benchmark;

public class MyBenchmark {

    @Benchmark
    public int testMethod() {
        int a = 1;
        int b = 2;
        int sum = a + b;
        return sum;
    }

}

JVM會檢測到sum的值是基於1和2這兩個常量值的和。因此，它可以將上述程式碼替換為以下內容：

import org.openjdk.jmh.annotations.Benchmark;

public class MyBenchmark {

    @Benchmark
    public int testMethod() {
        int sum = 3;
        return sum;
    }

}

或者直接return 3。

避免常量摺疊

為了避免常量摺疊，我們不能將常量硬編碼到基準測試方法中。相反，計算的輸入應該來自狀態物件。這使得JVM很難看出計算是基於常量值的。以下是一個例子：

import org.openjdk.jmh.annotations.*;

public class MyBenchmark {

    @State(Scope.Thread)
    public static class MyState {
        public int a = 1;
        public int b = 2;
    }


    @Benchmark 
    public int testMethod(MyState state) {
        int sum = state.a + state.b;
        return sum;
    }
}

如果你的基準測試方法計算了多個值，你可以將它們傳遞到Blackhole而不是返回它們，這樣也可以避免死程式碼消除最佳化。例如：

 @Benchmark 
    public void testMethod(MyState state, Blackhole blackhole) { 
        int sum1 = state.a + state.b; 
        int sum2 = state.a + state.a + state.b + state.b; 
        blackhole.consume(sum1); 
        blackhole.consume(sum2); 
    }

七、JMH測試結果分析

解讀JMH輸出的報告

每次跑完基準測試，JMH都會輸出詳細的測試報告，包括平均時間、吞吐量、單次操作時間、統計誤差等。分析這些結果時，你需要關注幾個關鍵點：

吞吐量（Throughput）：表示單位時間內可以完成的運算元量。
平均時間（Average Time）：表示完成一次操作所需的平均時間。
樣本時間（Sample Time）：基於取樣的執行時間，通常包含百分位數統計。
單次執行時間（Single Shot Time）：表示單次執行操作所需的時間，用於測試冷啟動效能。
統計誤差（Score Error）：表示測試結果的可變性，誤差越小，結果越穩定。

以下是一個簡單的JMH測試示例，我們會根據跑完這個例項再來分析生成的結果：

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
@Fork(1)
public class AnalysisBenchmark {

    @Benchmark
    public void measureMethod() {
        // 測試方法
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(AnalysisBenchmark.class.getSimpleName())
                .build();
        new Runner(opt).run();
    }
}

執行上述程式碼後，JMH會輸出類似以下的測試結果：

# Run complete. Total time: 00:01:41

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                        Mode  Cnt  Score   Error  Units
AnalysisBenchmark.measureMethod  avgt    5  0.268 ± 0.070  ns/op

我們可以從測試結果分析到如下資訊：

Score（分數）：0.268 ns/op表示每次操作的平均時間是0.268納秒。
Error（誤差）：±0.070%表示測試結果的誤差率，誤差越小，測試結果越可靠。
Score Error（分數誤差）：表示測試結果的可變性，這裡沒有給出具體數值，但通常在輸出中會顯示。

透過結果得到一下結論：

如果測試結果的誤差很小（例如±0.01%），則表示測試結果比較穩定和可靠。
如果測試結果顯示高誤差，可能需要增加迭代次數或預熱次數來降低誤差。
透過比較不同測試方法的結果，可以瞭解不同實現的效能差異。

分析JMH測試結果時，應該綜合考慮所有輸出的資料，包括誤差、百分位數和置信區間，以得出準確的結論。

八、案例研究

實際案例分析：使用JMH測試字串拼接效能

在Java中，字串拼接是一個常見的操作，有多種方式可以實現，比如使用String物件的+運算子、StringBuilder或StringBuffer。不同的方法在效能上可能存在差異，特別是在迴圈或大量拼接操作時。使用JMH可以對這些不同的字串拼接方法進行效能測試。

以下是一個使用JMH測試不同字串拼接方法效能的示例：

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@State(Scope.Thread)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
@Fork(1)
public class StringConcatenationBenchmark {

    @Benchmark
    public String concatUsingPlus() {
        String string = "";
        for (int i = 0; i < 100; i++) {
            string += "String";
        }
        return string;
    }

    @Benchmark
    public String concatUsingStringBuilder() {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 100; i++) {
            sb.append("String ").append(i);
        }
        return sb.toString();
    }

    @Benchmark
    public String concatUsingStringBuffer() {
        StringBuffer sb = new StringBuffer();
        for (int i = 0; i < 100; i++) {
            sb.append("String ").append(i);
        }
        return sb.toString();
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(StringConcatenationBenchmark.class.getSimpleName())
            .build();
        new Runner(opt).run();
    }
}

在這個例子中，我們定義了三個基準測試方法，分別測試使用+運算子、StringBuilder和StringBuffer進行字串拼接的效能。每個方法都會在迴圈中執行100次字串拼接操作。

執行這個基準測試後，JMH會輸出每個方法的吞吐量，即每秒可以完成的字串拼接運算元量。根據測試結果，我們可以得出哪種字串拼接方法在特定情況下效能更優。

分析測試結果

假設JMH輸出的測試結果如下：

Benchmark                                               Mode  Cnt       Score        Error  Units
StringConcatenationBenchmark.concatUsingPlus           thrpt    5  127801.402 ±   7365.452  ops/s
StringConcatenationBenchmark.concatUsingStringBuffer   thrpt    5  385107.338 ±  66488.847  ops/s
StringConcatenationBenchmark.concatUsingStringBuilder  thrpt    5  411992.746 ± 155229.314  ops/s

結論

1. concatUsingPlus：使用+運算子的吞吐量為127801.402 ± 7365.452 ops/s。
1. concatUsingStringBuilder：使用StringBuilder的吞吐量最高，為411992.746 ± 155229.314ops/s。
1. concatUsingStringBuffer：使用StringBuffer的吞吐量為385107.338 ± 66488.847 ops/s。

根據這些結果，我們可以得出結論，對於非同步的字串拼接操作，StringBuilder在效能上優於String和StringBuffer。這是因為String物件是不可變的，每次使用+運算子拼接字串時都會建立新的String物件，而StringBuilder則是可變的，可以在不建立新物件的情況下進行字串拼接。StringBuffer是同步的，因此其效能通常低於StringBuilder。

這個測試結果可以幫助開發者在實際開發中選擇合適的字串拼接方法，以最佳化效能。

注意：這裡的測試結果是基於JMH版本1.33，JDK版本JDK 1.8.0_202，虛擬機器版本：Java HotSpot(TM) 64-Bit Server VM, 25.202-b08，作業系統：Windows 10 64-bit CPU：Intel(R) Xeon(R) Platinum 8378C CPU @ 2.80GHz 2.80 GHz 記憶體：64 GB RAM。

九、總結

在本文中，我們介紹了Java基準測試工具JMH（Java Microbenchmark Harness）的基本使用方法和一些核心概念。我們探討了如何編寫有效的基準測試並避免常見的測試陷阱。最後，透過一個字串拼接的案例，展示了完整的JMH使用過程。希望透過閱讀本文，您可以對JMH有更深入的理解，並能夠在實際開發中應用這一工具來最佳化程式碼效能。

往期回顧

1.解析Go切片：為何按值傳遞時會發生改變？｜得物技術
2.彩虹橋架構演進之路-負載均衡篇｜得物技術
3.得物精準測試平臺設計與實現
4.基於IM場景下的Wasm初探：提升Web應用效能｜得物技術
5.增長在流量規則巡檢的探索實踐｜得物技術

文 / 魯班
關注得物技術，每週新技術乾貨
要是覺得文章對你有幫助的話，歡迎評論轉發點贊～
未經得物技術許可嚴禁轉載，否則依法追究法律責任。