JVM 內部原理(七)— Java 位元組碼基礎之二

Richaaaard發表於2016-12-23

JVM 內部原理(七)— Java 位元組碼基礎之二

介紹

版本:Java SE 7

為什麼需要了解 Java 位元組碼?

無論你是一名 Java 開發者、架構師、CxO 還是智慧手機的普通使用者,Java 位元組碼都在你面前,它是 Java 虛擬機器的基礎。

總監、管理者和非技術人員可以放輕鬆點:他們所要知道的就是開發團隊在正在進行下一版的開發,Java 位元組碼默默的在 JVM 平臺上執行。

簡單地說,Java 位元組碼是 Java 程式碼(如,class 檔案)的中間表現形式,它在 JVM 內部執行,那麼為什麼你需要關心它?因為如果沒有 Java 位元組碼,Java 程式就無法執行,因為它定義了 Java 開發者編寫程式碼的方式。

從技術角度看,JVM 在執行時將 Java 位元組碼以 JIT 的編譯方式將它們轉換成原生程式碼。如果沒有 Java 位元組碼在背後執行,JVM 就無法進行編譯並對映到原生程式碼上。

很多 IT 的專業技術人員可能沒有時間去學習彙編程式或者機器碼,可以將 Java 位元組碼看成是某種與底層程式碼相似的程式碼。但當出問題的時候,理解 JVM 的基本執行原理對解決問題非常有幫助。

在本篇文章中,你會知道如何閱讀與編寫 JVM 位元組碼,更好的理解執行時的工作原理,以及結構某些關鍵庫的能力。

本篇文章會包括一下話題:

  • 如何獲得位元組碼列表
  • 如何閱讀位元組碼
  • 語言結構是如何被編譯器對映的:區域性變數,方法呼叫,條件邏輯
  • ASM 簡介
  • 位元組碼在其他 JVM 語言(如,Groovy 和 Kotlin)中是如何工作的

目錄

  • 為什麼需要了解 Java 位元組碼?
  • 第一部分:Java 位元組碼簡介
    • 基礎
    • 基本特性
    • JVM 棧模型
      • 方法體裡面是什麼?
      • 區域性棧詳解
      • 區域性變數詳解
      • 流程控制
      • 算術運算及轉換
      • new & &
      • 方法呼叫及引數傳遞
  • 第二部分:ASM
    • ASM 與工具
  • 第三部分:Javassist
  • 總結

ASM

ObjectWeb ASM 事實上是 Java 位元組碼分析和操作的標準。ASM 通過它面向訪問者的 API 暴露 Java 類的內部聚合元件。API 本身不是很廣泛 - 只需要使用一部分類,就可以實現幾乎所有想要的。ASM 可以用來修改二進位制位元組碼,以及生成新的位元組碼。例如,ASM 可以應用到新的程式語言(Groovy、Kotlin、Scala),將高階程式語言的語法編譯成可供 JVM 執行的位元組碼。

“We didn’t even consider using anything else instead of ASM, because other projects at JetBrains use ASM successfully for a long time.”

– ANDREY BRESLAV, KOTLIN


My first touch with bytecode first hand was when I started helping in the Groovy project and by then we settled to ASM. ASM can do what is needed, is small and doesn’t try to be too smart to get into your way. ASM tries to be memory and performance effective. For example you don’t have to create huge piles of objects to create your bytecode. It was one of the first with support for invokedynamic btw. Of course it has its pro and con sides, but all in all I am happy with it, simply because I can get the job done using it.

– JOCHEN THEODOROU, GROOVY


I mostly know about ASM, just because it’s the one used by Groovy :) However, knowing that it’s backed by people like Rémi Forax, who is a major contributor in the JVM world is very important and guarantees that it follows the latest improvements.

– CÉDRIC CHAMPEAU, GROOVY

為了提供一個合適的介紹,我們會用 ASM 庫生成一個 “Hello World” 的示例,並迴圈列印任意數量的的短語。

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println(“Hello, World!”);
    }
}

要生成與示例對應的位元組碼,通常會建立 ClassWriter ,訪問結構 - 欄位、方法等,在任務完成後,輸出最終的位元組。

首先,建立一個 ClassWriter 例項:

ClassWriter cw = new ClassWriter(
        ClassWriter.COMPUTE_MAXS |
        ClassWriter.COMPUTE_FRAMES);

ClassWriter 例項可以通過一些常量來例項化,這些常量用來表示例項應該具有的行為。COMPUTE_MAXS 告訴 ASM 自動計算棧的最大值以及最大數量的方法的本地變數。COMPUTE_FRAMES 標識讓 ASM 自動計算方法的棧楨。

要定義類就必須呼叫 ClassWriter 上的 visit() 方法:

cw.visit(
    Opcodes.V1_6,
    Opcodes.ACC_PUBLIC,
    "HelloWorld",
    null,
    "java/lang/Object",
    null);

下一步,我們要生成預設的構造器和 main 方法。如果跳過生成預設構造器,也不會發生什麼壞事,但最好還是生成一個。

 MethodVisitor constructor =
    cw.visitMethod(
          Opcodes.ACC_PUBLIC,
          "",
          "()V",
          null,
          null);

 constructor.visitCode();

 //super()
 constructor.visitVarInsn(Opcodes.ALOAD, 0);
 constructor.visitMethodInsn(Opcodes.INVOKESPECIAL, 
    "java/lang/Object", "", "()V");
 constructor.visitInsn(Opcodes.RETURN);

 constructor.visitMaxs(0, 0);
 constructor.visitEnd();

首先用 visitMethod() 方法建立構造器。接著,我們通過呼叫 visitCode() 方法生成構造器體。然後呼叫 visitMaxs() 讓 ASM 重新計算堆疊的大小。如我們指出的那樣 ASM 可以自動為我們使用 ClassWriter 構造器內的 COMPUTE_MAXS 標識,我們可以隨機傳遞引數到 visitMaxs() 方法裡。最後,通過 visitEnd() 方法完成生成方法位元組碼的過程。

main 方法的 ASM 程式碼如下:

MethodVisitor mv = cw.visitMethod(
    Opcodes.ACC_PUBLIC + Opcodes.ACC_STATIC,
    "main", "([Ljava/lang/String;)V", null, null);

mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", 
    "out", "Ljava/io/PrintStream;");
mv.visitLdcInsn("Hello, World!");
mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream",
    "println", "(Ljava/lang/String;)V");
mv.visitInsn(Opcodes.RETURN);

mv.visitMaxs(0, 0);
mv.visitEnd();

通過再次呼叫 visitMethod() ,我們用 name、modifiers 和 signature 生成新的方法定義。然後和生成構造器的方式一樣使用 visitCode()、visitMaxs() 和 visitEnd() 方法。

可以發現程式碼裡都是常量、“標識(flags)” 和 “指示器(indicators)”,最終的程式碼不太容易通過肉眼來理解。同時,為了寫這些程式碼,需要關注位元組碼執行計劃才能生成正確版本的位元組碼。這也讓寫這種程式碼非常複雜。這也是為什麼每個人都有他們自己的方式使用 ASM 。


Our approach is using Kotlin’s ability to enhance existing Java APIs: we created some helper functions (many of them extension functions) that make ASM APIs look very much like a bytecode manipulation DSL.

– ANDREY BRESLAV, KOTLIN


I built some meta api into the compiler. For example it let’s you do a swap, regardless of the involved types. It was not in the links above, but I assume you know, that double and long consume two slots, while anything else does only one. The swap instruction handles only the 1-slot version. So if you have to swap an int and a long, a long and an int or a long and a long, you get a different set of instructions. I also added a helper API for local variables, to avoid to have to manage the index. If you want more nice looking code… Cedric wrote a Groovy DSL to generate bytecode. It is still the bytecode more or less, but less method around to make it less clear.

– JOCHEN THEODOROU, GROOVY


ASM is a nice low-level API, but I think we miss an up-to-date higher level API, for example for generating proxies and so on. In Groovy we want to limit the number of dependencies we add to the project, so it would be cool if ASM provided this out- of-the-box, but the general idea behind ASM is more to stick with a low level API.

– CÉDRIC CHAMPEAU, GROOVY

ASM 與 Tooling

工具對於學習和使用位元組碼有很大的幫助。學習使用 ASM 最好的方式就是寫一個與想要生成的 Java 原始檔等價的檔案,然後使用 Eclipse 位元組碼概覽(Bytecode Outline)外掛的 ASMifier 模式(或 ASMifier 工具)檢視等價的 ASM 編碼。如果想要實現一個類轉換工具,寫兩個 Java 原始檔(在轉換之前與之後)然後用外掛的比較檢視以 ASMifier 模式比較等價的 ASM 編碼。

Eclipse 位元組碼概覽外掛

image

對於 IntelliJ IDEA 使用者,ASM 位元組碼外掛也可以從外掛庫中獲取,並且非常容易使用。右鍵點選原始檔然後選擇 Show Bytecode 概覽 - 這樣可以開啟一個 ASMifier 工具生成的檢視。

ASM outline plugin in IntelliJ IDEA

image

你也可以直接應用 ASMifier ,不需要 IDE 外掛,它是 ASM 庫的一部分:

$java -classpath "asm.jar;asm-util.jar" \
       org.objectweb.asm.util.ASMifier \
       HelloWorld.class

We use ASM bytecode outline for IntelliJ IDEA and our own similar plugin that displays bytecodes generated by our compiler.

– ANDREY BRESLAV, KOTLIN


Actually, I wrote the “bytecode viewer” plugin for IntelliJ IDEA, and I’m using it quite often :) On the Groovy side, I also use the AST browser view, which provides a bytecode view too, although it seriously needs improvements.

– CÉDRIC CHAMPEAU, GROOVY


My tools are mostly org.objectweb.asm.util.Textifier and org.objectweb.asm.util.CheckClassAdapter. Some time ago I also wrote a tool helping me to visualize the bytecode and the stack information. It allows me to go through the bytecode and see what happens on the stack. And while bytecode used to be a pita to read for me in the beginning, I have seen so much of it, that I don’t even use that tool anymore, because I am usually faster just looking at the text produced by Textifier.

That is not supposed to tell you I am good at generating bytecode… no no.. I wouldn’t be able to read it so good if I had not the questionable pleasure of looking at it countless times, because there again was a pop of an empty stack or something like that. It is more that the problems I have to look for tend to repeat themselves and I have a whit of what to look for even before I fire up Textifier.

– JOCHEN THEODOROU, GROOVY

來自位元組碼專家的有趣故事

我們問 Andrey, Jochen 和 Cédric 分享一些他們 Java 位元組碼 的經驗。儘管詞 “bytecode” 和 “fun” 可能在一起不太合適,但這些熱心的朋友仍然分享了一些案例:


Hmm… bytecode and fun? What a strange combination of words in the same sentence ;)

Well.. one time maybe a little… I told you about the API I use to do a swap. In the beginning it was not working properly of course. That was partially due to me misunderstanding one for those DUP instructions, but mainly it was because I had a simple bug in my code in which I execute the 1-2 swap instead of the 2-1 swap (meaning swapping 1 and 2 slot operands). So I was looking at the code, totally confused, thinking this should work, looking at my code… then thinking I made it wrong with those dups and replacing the code with my new understanding…

All the while the code was not really all that wrong, only the swap cases where swapped. Anyway… after about a full day of getting a headache from too much looking at the bytecode I finally found my mistake and looked at the code to find it looks almost the same as before… and then it dawned on me, that it was only that simple mistake, that could have been corrected in a minute and which took me a full day. Not really funny, but there I laughed a bit at myself actually.

– JOCHEN THEODOROU, GROOVY


Actually, the funniest thing was when I wrote the “bytecode DSL” for Groovy, which allows you to write bytecode directly in the body of a method, using a DSL which is very close to what the ASM outline provides, and a nicer “groovy flavoured” DSL too. Although I started this project as a proof-of-concept and a personal experiment, I received a lot of feedback and interest about it.

Today I think it’s a very simple way to have people test bytecode directly, for example for students. It makes writing bytecode a lot easier than using ASM directly. However, I also received a lot of complains, people saying I opened the Pandora box and that it would produce unreadable code in production :D (and I would definitely not recommend using it in production). Yet, it’s been more than one year the project is out, and I haven’t heard of anyone using it, so probably bytecode is really not that fun!

– CÉDRIC CHAMPEAU, GROOVY


Many fun things come in connection with Android: Dalvik is very picky about your bytecode conformance to the JVM spec. And HotSpot doesn’t care a bit about many of these things. We were running smoothly on HotSpot for a long time, without knowing that we had so many things done wrong. Now we use Dalvik’s verifier to check every class file we generate, to make sure nobody forgot to put ACC_SUPER on a class, proper offsets to a local variable table, and things like that.

We also came across a few interesting things in HotSpot, for example, if you call an absent method on an array object (like array.set()), you don’t get a NoSuchMethodError, or anything like that. What you get (what we got on a HotSpot we had a year ago, anyway) is… a native crash. Segmentation fault, if I am not mistaken. Our theory is that the vtable for arrays if so optimized that it is not even there, and lookup crashes because of that.

– ANDREY BRESLAV, KOTLIN

結語

JVM 是工程的傑作,和其他任何美妙的機器一樣,能夠理解和欣賞底層的技術非常重要。Java 位元組碼是一種機器碼,它讓 JVM 解釋和編譯如 Java、Scala、Groovy、Kotlin 以及更多的程式語言編碼,從而為使用者生產出更多的應用。

Java 位元組碼在大多數時候悄悄的在 JVM 中在後臺執行,所以一般程式設計師很少考慮到它。但它是 JVM 上執行的指令,所以它對於某些領域的工具和程式分析非常重要,應用程式可以修改位元組碼從而根據應用的領域調整它們的行為。任何試圖開發效能分析工具,模取(mocking)框架,AOP 和其他工具的開發者都需要徹底的理解 Java bytecode

參考

參考來源:

The Java® Language Specification - Java SE 7 Edition

The Java® Language Specification - Chapter 6. The Java Virtual Machine Instruction Set

2015.01 A Java Programmer’s Guide to Byte Code

2012.11 Mastering Java Bytecode at the Core of the JVM

2011.01 Java Bytecode Fundamentals

2001.07 Java bytecode: Understanding bytecode makes you a better programmer

Wiki: Java bytecode

Wiki: Java bytecode instruction listings

結束

相關文章