Java 基礎 01-String 詳解

godloveplay發表於2019-07-31

原文網址 : https://learnku.com/articles/31880

概覽

1. 類宣告

String 被宣告為 final，因此它不可被繼承。

在 Java 8 及之前，內部使用 char 陣列儲存資料。

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];
}

在 Java 9 及之後，String 類的實現改用 byte 陣列儲存字串，同時使用 coder來標識使用了哪種字符集編碼。

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final byte[] value;

    /** The identifier of the encoding used to encode the bytes in {@code value}. */
    private final byte coder;
}

2. 建構函式

空參構造

  /**
    * final宣告的 value陣列不能修改它的引用，所以在建構函式中一定要初始化value屬性
    */
public String() {
       this.value = "".value;
   }

用一個String來構造

 /**
    * 除非你明確需要 這個original字串的 副本
    */
   public String(String original) {
       this.value = original.value;
       this.hash = original.hash;
   }

用char陣列來構造

public String(char value[], int offset, int count) {
       if (offset < 0) {
           throw new StringIndexOutOfBoundsException(offset);
       }
       if (count <= 0) {
           if (count < 0) {
               throw new StringIndexOutOfBoundsException(count);
           }
           if (offset <= value.length) {
               this.value = "".value;
               return;
           }
       }
       // Note: offset or count might be near -1>>>1.
       if (offset > value.length - count) {
           throw new StringIndexOutOfBoundsException(offset + count);
       }
       this.value = Arrays.copyOfRange(value, offset, offset+count);
   }

用byte[]來構造

/**
    * 構造一個由byte[]生產的字串，使用系統預設字符集編碼
    * 新陣列的長度 不一定等於 陣列的length
    * 如果預設字符集編碼不可用時，此構造器無效。
    */
   public String(byte bytes[], int offset, int length) {
       checkBounds(bytes, offset, length);
       this.value = StringCoding.decode(bytes, offset, length);
   }

用 Unicode編碼的int[]來構造

/**
    * 使用 Unicode編碼的int陣列 初始化字串
    * 入引數組修改不影響新建立的String
    * @since  1.5
    */
   public String(int[] codePoints, int offset, int count) {
       if (offset < 0) {
           throw new StringIndexOutOfBoundsException(offset);
       }
       if (count <= 0) {
           if (count < 0) {
               throw new StringIndexOutOfBoundsException(count);
           }
           //count = 0
           if (offset <= codePoints.length) {
               this.value = "".value;
               return;
           }
       }
       // Note: offset or count might be near -1>>>1.
       if (offset > codePoints.length - count) {
           throw new StringIndexOutOfBoundsException(offset + count);
       }

       final int end = offset + count;

       // Pass 1: Compute precise size of char[]
       int n = count;
       for (int i = offset; i < end; i++) {
           int c = codePoints[i];
           // 從 U+0000 至 U+FFFF 之間的字符集有時候被稱為基本多語言面
           // 可以使用單個char來表示這樣的程式碼點
           if (Character.isBmpCodePoint(c))
               continue;
           // 確認c 是不是
           else if (Character.isValidCodePoint(c))
               n++;
           else throw new IllegalArgumentException(Integer.toString(c));
       }

       // Pass 2: Allocate and fill in char[]
       // 得到可以轉成有效字元的 個數
       final char[] v = new char[n];

       for (int i = offset, j = 0; i < end; i++, j++) {
           int c = codePoints[i];
           if (Character.isBmpCodePoint(c))
               v[j] = (char)c;
           else
               Character.toSurrogates(c, v, j++);
       }

       this.value = v;
   }
}

用變長字串StringBuffer,StringBuilder來構造

public String(StringBuffer buffer) {
synchronized(buffer) {
    this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
}
}

public String(StringBuilder builder) {
this.value = Arrays.copyOf(builder.getValue(), builder.length());
}

3. 常用api

方法列表：

boolean isEmpty() //當且僅當 length() 為 0 時返回 true
int length() //返回此字串的長度
boolean contains(CharSequence s) //當且僅當此字串包含指定的 char 值序列時，返回 true
char charAt(int index) //返回指定索引處的 char 值
String concat(String str) //將指定字串連線到此字串的結尾

int indexOf(int ch) //返回指定字元在此字串中第一次出現處的索引
int lastIndexOf(int ch) //返回指定字元在此字串中最後一次出現處的索引    

String substring(int beginIndex, int endIndex) //返回一個新字串，它是此字串的一個子字串
CharSequence subSequence(int beginIndex, int endIndex) //返回一個新的字元序列，它是此序列的一個子序列

int compareTo(String anotherString) //按字典順序比較兩個字串

int compareToIgnoreCase(String str) //按字典順序比較兩個字串，不考慮大小寫
boolean equalsIgnoreCase(String anotherString) //將此 String 與另一個 String 比較，不考慮大小寫

static String valueOf(double d)  
static String valueOf(boolean b) 

byte[] getBytes(Charset charset) //使用給定的 charset 將此 String 編碼到 byte 序列，並將結果儲存到新的 byte 陣列
byte[] getBytes(String charsetName) //使用指定的字符集將此 String 編碼為 byte 序列，並將結果儲存到一個新的 byte 陣列中

String toLowerCase(Locale locale) //使用給定 Locale 的規則將此 String 中的所有字元都轉換為小寫  
String toUpperCase(Locale locale)

boolean matches(String regex) //告知此字串是否匹配給定的正規表示式
String[] split(String regex, int limit) //根據匹配給定的正規表示式來拆分此字串

boolean startsWith(String prefix, int toffset) //測試此字串從指定索引開始的子字串是否以指定字首開始
boolean endsWith(String suffix)

static String copyValueOf(char[] data)//返回指定陣列中表示該字元序列的
char[] toCharArray() //將此字串轉換為一個新的字元陣列

String replace(char oldChar, char newChar) //返回一個新的字串，它是通過用 newChar 替換此字串中出現的所有 oldChar 得到的
String replaceAll(String regex, String replacement) //使用給定的 replacement 替換此字串所有匹配給定的正規表示式的子字串

String intern() //返回字串物件的規範化表示形式,字串pool中的存在返回，不存在存入pool並返回
String trim()//返回字串的副本，忽略前導空白和尾部空白

static String format(Locale l, String format, Object... args) //使用指定的語言環境、格式字串和引數返回一個格式化字串

4. 不可修改的特點

為何不可修改

以下兩點保證String的不可修改特點

value 被宣告為 final，即value引用的地址不可修改。
String類沒有暴露修改value引用內容的方法。

不可修改的優點

從記憶體，同步和資料結構角度分析：

Requirement of String Pool：字串池（String intern pool）是方法區域中的特殊儲存區域。建立字串並且池中已存在該字串時，將返回現有字串的引用，而不是建立新物件。如果字串可變，這將毫無意義。
Caching Hashcode：hashcode在java中被頻繁的使用，在String類中存在屬性
private int hash;//this is used to cache hash code.

Facilitating the Use of Other Objects：確保第三方使用。舉一個例子：

//假設String.class 有屬性 value；
//set的本意是保證元素不重複出現，如果String是可變的，則會破壞這個規則
HashSet<String> set = new HashSet<String>();
set.add(new String("a"));
set.add(new String("b"));
set.add(new String("c"));

for(String a: set)
a.value = "a";

Security：String被廣泛用作許多java類的引數，例如網路連線，開啟檔案等。字串不是不可變的，連線或檔案將被更改，這可能會導致嚴重的安全威脅。該方法認為它連線到一臺機器，但事實並非如此。可變字串也可能在Reflection中引起安全問題，因為引數是字串。例子：
```
boolean connect(string s){
   if (!isSecure(s)) { 
throw new SecurityException(); 
}
   //here will cause problem, if s is changed before this by using other references.    
   causeProblem(s);
}
```
Immutable objects are naturally thread-safe：由於無法更改不可變物件，因此可以在多個執行緒之間自由共享它們。這消除了進行同步的要求。

總之，出於效率和安全原因，String被設計為不可變的。這也是在一般情況下在一些情況下優選不可變類的原因。

5. 字串pool

什麼是池

在 JAVA 語言中有8中基本型別和一種比較特殊的型別String。這些型別為了使他們在執行過程中速度更快，更節省記憶體，都提供了一種常量池的概念。常量池就類似一個JAVA系統級別提供的快取。8種基本型別的常量池都是系統協調的，String型別的常量池比較特殊。它的主要使用方法有兩種：

直接使用雙引號宣告出來的String物件會直接儲存在常量池中
如果不是用雙引號宣告的String物件，可以使用String提供的intern方法。intern 方法會從字串常量池中查詢當前字串是否存在，若不存在就會將當前字串放入常量池中
在 jdk6 及以前的版本中，字串的常量池是放在堆的 Perm 區的（Perm 區是一個類靜態的區域，主要儲存一些載入類的資訊，常量池，方法片段等內容，預設大小隻有4m），一旦常量池中大量使用 intern 是會直接產生java.lang.OutOfMemoryError: PermGen space錯誤的。
在jdk7中，字串常量池已經從 Perm 區移到正常的 Java Heap 區域。

String#intern方法

它的大體實現結構就是: JAVA 使用 jni 呼叫c++實現的StringTable的intern方法, StringTable的intern方法跟Java中的HashMap的實現是差不多的, 只是不能自動擴容。預設大小是1009

注意點：

String的String Pool是一個固定大小的Hashtable，預設值大小長度是1009
如果放進String Pool的String非常多，就會造成Hash衝突嚴重，從而導致連結串列會很長，導致呼叫String.intern時效能會大幅下降（因為要一個一個找）
在 jdk6中StringTable的長度是固定 = 1009，所以如果常量池中的字串過多就會導致效率下降很快。在jdk7中，StringTable的長度可以通過一個引數指定：-XX:StringTableSize=99991

例項思考

// JDK6 中執行： false false
// JDK7 中執行： false true
public static void main(String[] args) {
    // 宣告的字元創變數 -> 堆
    String s = new String("1");
    s.intern();
    // 宣告的字元創常量 -> 堆的 Perm 區
    String s2 = "1";
    System.out.println(s == s2);

    String s3 = new String("1") + new String("1");
    s3.intern();
    String s4 = "11";
    System.out.println(s3 == s4);
}

// JDK6 中執行： false false
// JDK7 中執行： false false
public static void main(String[] args) {
    String s = new String("1");
    String s2 = "1";
    s.intern();
    System.out.println(s == s2);

    String s3 = new String("1") + new String("1");
    String s4 = "11";
    s3.intern();
    System.out.println(s3 == s4);
}

jdk6記憶體分析(注：圖中綠色線條代表 string 物件的內容指向。黑色線條代表地址指向)
- String s = new String("1");Perm中的常量池生成"1"，堆中生成變數s內容="1"
- s2 -> 常量池中的"1"
- String s3 = new String("1") + new String("1");Perm中的常量池生成"1"，堆中生成兩個匿名string 內容="1" 和變數 s3
- s.intern();將"11"寫入常量池
jdk7記憶體分析-1
- 在第一段程式碼中，先看 s3和s4字串。String s3 = new String("1") + new String("1");，這句程式碼中現在生成了2最終個物件，是字串常量池中的“1” 和 JAVA Heap 中的 s3引用指向的物件。中間還有2個匿名的new String("1")我們不去討論它們。此時s3引用物件內容是”11”，但此時常量池中是沒有 “11”物件的。
- 接下來s3.intern();這一句程式碼，是將 s3中的“11”字串放入 String 常量池中，因為此時常量池中不存在“11”字串，因此常規做法是跟 jdk6 圖中表示的那樣，在常量池中生成一個 “11” 的物件，關鍵點是 jdk7 中常量池不在 Perm 區域了，這塊做了調整。常量池中不需要再儲存一份物件了，可以直接儲存堆中的引用。這份引用指向 s3 引用的物件。也就是說引用地址是相同的。
- 最後String s4 = "11"; 這句程式碼中”11”是顯示宣告的，因此會直接去常量池中建立，建立的時候發現已經有這個物件了，此時也就是指向 s3 引用物件的一個引用。所以 s4 引用就指向和 s3 一樣了。因此最後的比較 s3 == s4 是 true。
- 再看 s 和 s2 物件。 String s = new String("1"); 第一句程式碼，生成了2個物件。常量池中的“1” 和 JAVA Heap 中的字串物件。s.intern(); 這一句是 s 物件去常量池中尋找後發現 “1” 已經在常量池裡了。
- 接下來String s2 = "1"; 這句程式碼是生成一個 s2的引用指向常量池中的“1”物件。結果就是 s 和 s2 的引用地址明顯不同。圖中畫的很清晰。
jdk7記憶體分析-2
- 來看第二段程式碼，從上邊第二幅圖中觀察。第一段程式碼和第二段程式碼的改變就是 s3.intern(); 的順序是放在String s4 = "11";後了。這樣，首先執行String s4 = "11";宣告 s4 的時候常量池中是不存在“11”物件的，執行完畢後，“11“物件是 s4 宣告產生的新物件。然後再執行s3.intern();時，常量池中“11”物件已經存在了，因此 s3 和 s4 的引用是不同的。
- 第二段程式碼中的 s 和 s2 程式碼中，s.intern();，這一句往後放也不會有什麼影響了，因為物件池中在執行第一句程式碼String s = new String("1");的時候已經生成“1”物件了。下邊的s2宣告都是直接從常量池中取地址引用的。 s 和 s2 的引用地址是不會相等的。
小結-從上述的例子程式碼可以看出 jdk7 版本對 intern 操作和常量池都做了一定的修。主要包括2點：
- 將String常量池從 Perm 區移動到了 Java Heap區
- String#intern 方法時，如果存在堆中的物件，會直接儲存物件的引用，而不會重新建立物件。