從JDK原始碼角度看Integer

超人汪小建發表於2019-03-04

原文網址 : https://flycode.co/archives/284910

概況

Java的Integer類主要的作用就是對基本型別int進行封裝，提供了一些處理int型別的方法，比如int到String型別的轉換方法或String型別到int型別的轉換方法，當然也包含與其他型別之間的轉換方法。除此之外還有一些位相關的操作。

繼承結構

--java.lang.Object
  --java.lang.Number
    --java.lang.Integer複製程式碼

主要屬性

第一部分

public static final int   MIN_VALUE = 0x80000000;
public static final int   MAX_VALUE = 0x7fffffff;
public static final int SIZE = 32;
public static final int BYTES = SIZE / Byte.SIZE;
public static final Class<Integer>  TYPE = (Class<Integer>) Class.getPrimitiveClass("int");複製程式碼

MIN_VALUE靜態變數表示int能取的最小值，為-2的31次方，被final修飾說明不可變。
類似的還有MAX_VALUE，表示int最大值為2的31次方減1。
SIZE用來表示二進位制補碼形式的int值的位元數，值為32，靜態變數且不可變。
BYTES用來表示二進位制補碼形式的int值的位元組數，值為SIZE除於Byte.SIZE，結果為4。
TYPE的toString的值是int。
Class的getPrimitiveClass是一個native方法，在Class.c中有個Java_java_lang_Class_getPrimitiveClass方法與之對應，所以JVM層面會通過JVM_FindPrimitiveClass函式根據"int"字串獲得jclass，最終到Java層則為Class<Integer>。

JNIEXPORT jclass JNICALL
Java_java_lang_Class_getPrimitiveClass(JNIEnv *env,
                                       jclass cls,
                                       jstring name)
{
    const char *utfName;
    jclass result;

    if (name == NULL) {
        JNU_ThrowNullPointerException(env, 0);
        return NULL;
    }

    utfName = (*env)->GetStringUTFChars(env, name, 0);
    if (utfName == 0)
        return NULL;

    result = JVM_FindPrimitiveClass(env, utfName);

    (*env)->ReleaseStringUTFChars(env, name, utfName);

    return result;
}複製程式碼

當TYPE執行toString時，邏輯如下，則其實是getName函式決定其值，getName通過native方法getName0從JVM層獲取名稱，

public String toString() {
        return (isInterface() ? "interface " : (isPrimitive() ? "" : "class "))
            + getName();
    }複製程式碼

getName0根據一個陣列獲得對應的名稱，JVM根據Java層的Class可得到對應型別的陣列下標，比如這裡下標為10，則名稱為"int"。

const char* type2name_tab[T_CONFLICT+1] = {
  NULL, NULL, NULL, NULL,
  "boolean",
  "char",
  "float",
  "double",
  "byte",
  "short",
  "int",
  "long",
  "object",
  "array",
  "void",
  "*address*",
  "*narrowoop*",
  "*conflict*"
};複製程式碼

第二部分

final static char [] DigitTens = {
        '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
        '1', '1', '1', '1', '1', '1', '1', '1', '1', '1',
        '2', '2', '2', '2', '2', '2', '2', '2', '2', '2',
        '3', '3', '3', '3', '3', '3', '3', '3', '3', '3',
        '4', '4', '4', '4', '4', '4', '4', '4', '4', '4',
        '5', '5', '5', '5', '5', '5', '5', '5', '5', '5',
        '6', '6', '6', '6', '6', '6', '6', '6', '6', '6',
        '7', '7', '7', '7', '7', '7', '7', '7', '7', '7',
        '8', '8', '8', '8', '8', '8', '8', '8', '8', '8',
        '9', '9', '9', '9', '9', '9', '9', '9', '9', '9',
        } ;

final static char [] DigitOnes = {
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        } ;

final static char[] digits = {
        '0' , '1' , '2' , '3' , '4' , '5' ,
        '6' , '7' , '8' , '9' , 'a' , 'b' ,
        'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
        'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
        'o' , 'p' , 'q' , 'r' , 's' , 't' ,
        'u' , 'v' , 'w' , 'x' , 'y' , 'z'
    };
final static int [] sizeTable = { 9, 99, 999, 9999, 99999, 999999, 9999999,
                                      99999999, 999999999, Integer.MAX_VALUE };複製程式碼

DigitTens和DigitOnes兩個陣列放到一起講更好理解，它們主要用於獲取0到99之間某個數的十位和個位，比如48，通過DigitTens陣列直接取出來十位為4，而通過DigitOnes陣列取出來個位為8。
digits陣列用於表示數字的所有可能的字元，因為int支援從2進位制到36進位制，所以這裡需要有36個字元才能表示所有不同進位制的數字。
sizeTable陣列主要用在判斷一個int型數字對應字串的長度。比如相關的方法如下，這種方法可以高效得到對應字串長度，避免了使用除法或求餘等操作。
```
static int stringSize(int x) {
    for (int i=0; ; i++)
        if (x <= sizeTable[i])
            return i+1;
}複製程式碼
```

IntegerCache內部類

private static class IntegerCache {
        static final int low = -128;
        static final int high;
        static final Integer cache[];

        static {
            int h = 127;
            String integerCacheHighPropValue =
                sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
            if (integerCacheHighPropValue != null) {
                try {
                    int i = parseInt(integerCacheHighPropValue);
                    i = Math.max(i, 127);
                    h = Math.min(i, Integer.MAX_VALUE - (-low) -1);
                } catch( NumberFormatException nfe) {
                }
            }
            high = h;

            cache = new Integer[(high - low) + 1];
            int j = low;
            for(int k = 0; k < cache.length; k++)
                cache[k] = new Integer(j++);
            assert IntegerCache.high >= 127;
        }

        private IntegerCache() {}
    }複製程式碼

IntegerCache是Integer的一個內部類，它包含了int可能值的Integer陣列，預設範圍是[-128,127]，它不會像Byte類將所有可能值快取起來，因為int型別範圍很大，將它們全部快取起來代價太高，而Byte型別就是從-128到127，一共才256個。所以這裡預設只例項化256個Integer物件，當Integer的值範圍在[-128,127]時則直接從快取中獲取對應的Integer物件，不必重新例項化。這些快取值都是靜態且final的，避免重複的例項化和回收。另外我們可以改變這些值快取的範圍，再啟動JVM時通過-Djava.lang.Integer.IntegerCache.high=xxx就可以改變快取值的最大值，比如-Djava.lang.Integer.IntegerCache.high=500則會快取[-128,500]。

主要方法

parseInt方法

public static int parseInt(String s, int radix)
                throws NumberFormatException
    {
        if (s == null) {
            throw new NumberFormatException("null");
        }

        if (radix < Character.MIN_RADIX) {
            throw new NumberFormatException("radix " + radix +
                                            " less than Character.MIN_RADIX");
        }

        if (radix > Character.MAX_RADIX) {
            throw new NumberFormatException("radix " + radix +
                                            " greater than Character.MAX_RADIX");
        }

        int result = 0;
        boolean negative = false;
        int i = 0, len = s.length();
        int limit = -Integer.MAX_VALUE;
        int multmin;
        int digit;

        if (len > 0) {
            char firstChar = s.charAt(0);
            if (firstChar < '0') { 
                if (firstChar == '-') {
                    negative = true;
                    limit = Integer.MIN_VALUE;
                } else if (firstChar != '+')
                    throw NumberFormatException.forInputString(s);

                if (len == 1) 
                    throw NumberFormatException.forInputString(s);
                i++;
            }
            multmin = limit / radix;
            while (i < len) {
                digit = Character.digit(s.charAt(i++),radix);
                if (digit < 0) {
                    throw NumberFormatException.forInputString(s);
                }
                if (result < multmin) {
                    throw NumberFormatException.forInputString(s);
                }
                result *= radix;
                if (result < limit + digit) {
                    throw NumberFormatException.forInputString(s);
                }
                result -= digit;
            }
        } else {
            throw NumberFormatException.forInputString(s);
        }
        return negative ? result : -result;
    }

    public static int parseInt(String s) throws NumberFormatException {
        return parseInt(s,10);
    }複製程式碼

兩個parseInt方法，主要看第一個即可，第一個引數是待轉換的字串，第二個參數列示進位制數。怎麼更好理解這個引數呢？舉個例子，Integer.parseInt("100",10)表示十進位制的100，所以值為100，而Integer.parseInt("100",2)表示二進位制的100，所以值為4。另外如果Integer.parseInt("10000000000",10)會丟擲java.lang.NumberFormatException異常。

該方法的邏輯是首先判斷字串不為空且進位制數在Character.MIN_RADIX和Character.MAX_RADIX之間，即2到36。然後判斷輸入的字串的長度必須大於0，再根據第一個字元可能為數字或負號或正號進行處理。核心處理邏輯是字串轉換數字，n進位制轉成十進位制辦法基本大家都知道的了，假如357為8進位制，則結果為38^2+58^1+78^0 = 239，假如357為十進位制，則結果為310^2+510^1+710^0 = 357，上面的轉換方法也差不多是根據此方法，只是稍微轉變了思路，方式分別為((38+5)8+7) = 239和((310+5)10+7)=357。從中可以推出規則了，從左到右遍歷字串的每個字元，然後乘以進位制數，再加上下一個字元，接著再乘以進位制數，再加上下個字元，不斷重複，直到最後一個字元。除此之外另外一個不同就是上面的轉換不使用加法來做，全都轉成負數來運算，其實可以看成是等價了，這個很好理解，而為什麼要這麼做就要歸咎到int型別的範圍了，因為負數Integer.MIN_VALUE變化為正數時會導致數值溢位，所以全部都用負數來運算。

建構函式

public Integer(int value) {
        this.value = value;
    }

public Integer(String s) throws NumberFormatException {
        this.value = parseInt(s, 10);
    }複製程式碼

包含兩種建構函式，分別可以傳入int和String型別。它是通過呼叫parseInt方法進行轉換的，所以轉換邏輯與上面的parseInt方法一樣。

getChars方法

static void getChars(int i, int index, char[] buf) {
        int q, r;
        int charPos = index;
        char sign = 0;

        if (i < 0) {
            sign = '-';
            i = -i;
        }

        while (i >= 65536) {
            q = i / 100;
            r = i - ((q << 6) + (q << 5) + (q << 2));
            i = q;
            buf [--charPos] = DigitOnes[r];
            buf [--charPos] = DigitTens[r];
        }

        for (;;) {
            q = (i * 52429) >>> (16+3);
            r = i - ((q << 3) + (q << 1));  
            buf [--charPos] = digits [r];
            i = q;
            if (i == 0) break;
        }
        if (sign != 0) {
            buf [--charPos] = sign;
        }
    }複製程式碼

該方法主要做的事情是將某個int型數值放到char陣列裡面，比如把357按順序放到char陣列中。這裡面處理用了較多技巧，int高位的兩個位元組和低位的兩個位元組分開處理，while (i >= 65536)部分就是處理高位的兩個位元組，每次處理2位數，這裡有個特殊的地方((q << 6) + (q << 5) + (q << 2))其實等於q*100,DigitTens和DigitOnes陣列前面已經講過它的作用了，用來獲取十位和個位。再看接下去的低位的兩個位元組怎麼處理，其實本質也是求餘思想，但又用了一些技巧，比如(i * 52429) >>> (16+3)其實約等於i/10，((q << 3) + (q << 1))其實等於q*10，然後再通過digits陣列獲取到對應的字元。可以看到低位處理時它儘量避開了除法，取而代之的是用乘法和右移來實現，可見除法是一個比較耗時的操作，比起乘法和移位。另外也可以看到能用移位和加法來實現乘法的地方也儘量不用乘法，這也說明乘法比起它們更加耗時。而高位處理時沒有用移位是因為做乘法後可能會溢位。

toString方法

public static String toString(int i) {
        if (i == Integer.MIN_VALUE)
            return "-2147483648";
        int size = (i < 0) ? stringSize(-i) + 1 : stringSize(i);
        char[] buf = new char[size];
        getChars(i, size, buf);
        return new String(buf, true);
    }
public String toString() {
        return toString(value);
    }
public static String toString(int i, int radix) {
        if (radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)
            radix = 10;

        if (radix == 10) {
            return toString(i);
        }

        char buf[] = new char[33];
        boolean negative = (i < 0);
        int charPos = 32;

        if (!negative) {
            i = -i;
        }

        while (i <= -radix) {
            buf[charPos--] = digits[-(i % radix)];
            i = i / radix;
        }
        buf[charPos] = digits[-i];

        if (negative) {
            buf[--charPos] = '-';
        }

        return new String(buf, charPos, (33 - charPos));
    }複製程式碼

一共有3個toString方法，兩個靜態方法一個是非靜態方法，第一個toString方法很簡單，就是先用stringSize得到數字是多少位，再用getChars獲取數字對應的char陣列，最後返回一個String型別。第二個toString呼叫第一個toString，沒啥好說。第三個otString方法是帶了進位制資訊的，它會轉換成對應進位制的字串。凡是不在2到36進位制範圍之間的都會被處理成10進位制，我們都知道從十進位制轉成其他進位制時就是不斷地除於進位制數得到餘數，然後把餘數反過來串起來就是最後結果，所以這裡其實也是這樣子做的，得到餘數後通過digits陣列獲取到對應的字元，而且這裡是用負數的形式來運算的。

valueOf方法

public static Integer valueOf(int i) {
        if (i >= IntegerCache.low && i <= IntegerCache.high)
            return IntegerCache.cache[i + (-IntegerCache.low)];
        return new Integer(i);
    }
public static Integer valueOf(String s) throws NumberFormatException {
        return Integer.valueOf(parseInt(s, 10));
    }
public static Integer valueOf(String s, int radix) throws NumberFormatException {
        return Integer.valueOf(parseInt(s,radix));
    }複製程式碼

有三個valueOf方法，核心邏輯在第一個valueOf方法中，因為IntegerCache快取了[low,high]值的Integer物件，對於在範圍內的直接從IntegerCache的陣列中獲取對應的Integer物件即可，而在範圍外的則需要重新例項化了。

decode方法

public static Integer decode(String nm) throws NumberFormatException {
        int radix = 10;
        int index = 0;
        boolean negative = false;
        Integer result;

        if (nm.length() == 0)
            throw new NumberFormatException("Zero length string");
        char firstChar = nm.charAt(0);
        if (firstChar == '-') {
            negative = true;
            index++;
        } else if (firstChar == '+')
            index++;
        if (nm.startsWith("0x", index) || nm.startsWith("0X", index)) {
            index += 2;
            radix = 16;
        }
        else if (nm.startsWith("#", index)) {
            index ++;
            radix = 16;
        }
        else if (nm.startsWith("0", index) && nm.length() > 1 + index) {
            index ++;
            radix = 8;
        }

        if (nm.startsWith("-", index) || nm.startsWith("+", index))
            throw new NumberFormatException("Sign character in wrong position");

        try {
            result = Integer.valueOf(nm.substring(index), radix);
            result = negative ? Integer.valueOf(-result.intValue()) : result;
        } catch (NumberFormatException e) {
            String constant = negative ? ("-" + nm.substring(index))
                                       : nm.substring(index);
            result = Integer.valueOf(constant, radix);
        }
        return result;
    }複製程式碼

decode方法主要作用是解碼字串轉成Integer型，比如Integer.decode("11")的結果為11；Integer.decode("0x11")和Integer.decode("#11")結果都為17，因為0x和#開頭的會被處理成十六進位制；Integer.decode("011")結果為9，因為0開頭會被處理成8進位制。

xxxValue方法

public byte byteValue() {
        return (byte)value;
    }
public short shortValue() {
        return (short)value;
    }
public int intValue() {
        return value;
    }
public long longValue() {
        return (long)value;
    }
public float floatValue() {
        return (float)value;
    }
public double doubleValue() {
        return (double)value;
    }複製程式碼

包括shortValue、intValue、longValue、byteValue、floatValue和doubleValue等方法，其實就是轉換成對應的型別。

hashCode方法

public int hashCode() {
        return Integer.hashCode(value);
    }
public static int hashCode(int value) {
        return value;
    }複製程式碼

hashCode方法很簡單，就是直接返回int型別的值。

equals方法

public boolean equals(Object obj) {
        if (obj instanceof Integer) {
            return value == ((Integer)obj).intValue();
        }
        return false;
    }複製程式碼

比較是否相同時先判斷是不是Integer型別再比較值。

compare方法

public static int compare(int x, int y) {
        return (x < y) ? -1 : ((x == y) ? 0 : 1);
    }複製程式碼

x小於y則返回-1，相等則返回0，否則返回1。

無符號轉換

public static long toUnsignedLong(int x) {
        return ((long) x) & 0xffffffffL;
    }
public static String toUnsignedString(int i) {
        return Long.toString(toUnsignedLong(i));
    }
public static String toUnsignedString(int i, int radix) {
        return Long.toUnsignedString(toUnsignedLong(i), radix);
    }複製程式碼

轉成無符號long型。

bitCount方法

public static int bitCount(int i) {
        i = i - ((i >>> 1) & 0x55555555);
        i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
        i = (i + (i >>> 4)) & 0x0f0f0f0f;
        i = i + (i >>> 8);
        i = i + (i >>> 16);
        return i & 0x3f;
    }複製程式碼

該方法主要用於計算二進位制數中1的個數。一看有點懵，都是移位和加減操作。先將重要的列出來，0x55555555等於01010101010101010101010101010101，0x33333333等於110011001100110011001100110011，0x0f0f0f0f等於1111000011110000111100001111。它的核心思想就是先每兩位一組統計看有多少個1，比如10011111則每兩位有1、1、2、2個1，記為01011010，然後再算每四位一組看有多少個1，而01011010則每四位有2、4個1，記為00100100，接著每8位一組就為00000110，接著16位，32位，最終在與0x3f進行與運算，得到的數即為1的個數。

highestOneBit方法

public static int highestOneBit(int i) {
        i |= (i >>  1);
        i |= (i >>  2);
        i |= (i >>  4);
        i |= (i >>  8);
        i |= (i >> 16);
        return i - (i >>> 1);
    }複製程式碼

該方法返回i的二進位制中最高位的1，其他全為0的值。比如i=10時，二進位制即為1010，最高位的1，其他為0，則是1000。如果i=0，則返回0。如果i為負數則固定返回-2147483648，因為負數的最高位一定是1，即有1000,0000,0000,0000,0000,0000,0000,0000。這一堆移位操作是什麼意思？其實也不難理解，將i右移一位再或操作，則最高位1的右邊也為1了，接著再右移兩位並或操作，則右邊1+2=3位都為1了，接著1+2+4=7位都為1，直到1+2+4+8+16=31都為1，最後用i - (i >>> 1)自然得到最終結果。

lowestOneBit方法

public static int lowestOneBit(int i) {
        return i & -i;
    }複製程式碼

與highestOneBit方法對應，lowestOneBit獲取最低位1，其他全為0的值。這個操作較簡單，先取負數，這個過程需要對正數的i取反碼然後再加1，得到的結果和i進行與操作，剛好就是最低位1其他為0的值了。

numberOfLeadingZeros方法

 public static int numberOfLeadingZeros(int i) {
        if (i == 0)
            return 32;
        int n = 1;
        if (i >>> 16 == 0) { n += 16; i <<= 16; }
        if (i >>> 24 == 0) { n +=  8; i <<=  8; }
        if (i >>> 28 == 0) { n +=  4; i <<=  4; }
        if (i >>> 30 == 0) { n +=  2; i <<=  2; }
        n -= i >>> 31;
        return n;
    }複製程式碼

該方法返回i的二進位制從頭開始有多少個0。i為0的話則有32個0。這裡處理其實是體現了二分查詢思想的，先看高16位是否為0，是的話則至少有16個0，否則左移16位繼續往下判斷，接著右移24位看是不是為0，是的話則至少有16+8=24個0，直到最後得到結果。

numberOfTrailingZeros方法

public static int numberOfTrailingZeros(int i) {
        int y;
        if (i == 0) return 32;
        int n = 31;
        y = i <<16; if (y != 0) { n = n -16; i = y; }
        y = i << 8; if (y != 0) { n = n - 8; i = y; }
        y = i << 4; if (y != 0) { n = n - 4; i = y; }
        y = i << 2; if (y != 0) { n = n - 2; i = y; }
        return n - ((i << 1) >>> 31);
    }複製程式碼

與前面的numberOfLeadingZeros方法對應，該方法返回i的二進位制從尾開始有多少個0。它的思想和前面的類似，也是基於二分查詢思想，詳細步驟不再贅述。

reverse方法

public static int reverse(int i) {
        i = (i & 0x55555555) << 1 | (i >>> 1) & 0x55555555;
        i = (i & 0x33333333) << 2 | (i >>> 2) & 0x33333333;
        i = (i & 0x0f0f0f0f) << 4 | (i >>> 4) & 0x0f0f0f0f;
        i = (i << 24) | ((i & 0xff00) << 8) |
            ((i >>> 8) & 0xff00) | (i >>> 24);
        return i;
    }複製程式碼

該方法即是將i進行反轉，反轉就是第1位與第32位對調，第二位與第31位對調，以此類推。它的核心思想是先將相鄰兩位進行對換，比如10100111對換01011011，接著再將相鄰四位進行對換，對換後為10101101，接著將相鄰八位進行對換，最後把32位中中間的16位對換，然後最高8位再和最低8位對換。

toHexString和toOctalString方法

public static String toHexString(int i) {
        return toUnsignedString0(i, 4);
    }
public static String toOctalString(int i) {
        return toUnsignedString0(i, 3);
    }
private static String toUnsignedString0(int val, int shift) {
        int mag = Integer.SIZE - Integer.numberOfLeadingZeros(val);
        int chars = Math.max(((mag + (shift - 1)) / shift), 1);
        char[] buf = new char[chars];

        formatUnsignedInt(val, shift, buf, 0, chars);

        return new String(buf, true);
    }
static int formatUnsignedInt(int val, int shift, char[] buf, int offset, int len) {
        int charPos = len;
        int radix = 1 << shift;
        int mask = radix - 1;
        do {
            buf[offset + --charPos] = Integer.digits[val & mask];
            val >>>= shift;
        } while (val != 0 && charPos > 0);

        return charPos;
    }複製程式碼

這兩個方法類似，合到一起講。看名字就知道轉成8進位制和16進位制的字串。可以看到都是間接呼叫toUnsignedString0方法，該方法會先計算轉換成對應進位制需要的字元數，然後再通過formatUnsignedInt方法來填充字元陣列，該方法做的事情就是使用進位制之間的轉換方法（前面有提到過）來獲取對應的字元。

以下是廣告和相關閱讀

========廣告時間========

鄙人的新書《Tomcat核心設計剖析》已經在京東銷售了，有需要的朋友可以到 item.jd.com/12185360.ht… 進行預定。感謝各位朋友。

為什麼寫《Tomcat核心設計剖析》

=========================