Go string 一清二楚

YahuiAn發表於2021-02-16

原文網址 : https://www.cnblogs.com/yahuian/p/14407700.html

gopher

前言

字串（string）作為 go 語言的基本資料型別，在開發中必不可少，我們務必深入學習一下，做到一清二楚。

本文假設讀者已經知道切片（slice）的使用，如不瞭解，可閱讀 Go 切片基本知識點

為了更好的理解後文，推薦先閱讀 Unicode 字符集，UTF-8 編碼

是什麼

In Go, a string is in effect a read-only slice of bytes.

在 go 語言中，字串實際上是一個只讀的位元組切片，其資料結構定義如下：

// runtime/string.go
type stringStruct struct {
	str unsafe.Pointer	// 指向底層位元組陣列的指標
	len int				// 位元組陣列的長度 
}

注意：byte 其實是 uint8 的型別別名

// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8

怎麼用

func main() {
	// 使用字串字面量初始化
	var a = "hi,狗"
	fmt.Println(a)

	// 可以使用下標訪問，但不可修改
	fmt.Printf("a[0] is %d\n", a[0])
	fmt.Printf("a[0:2] is %s\n", a[0:2])
	// a[0] = 'a' 編譯報錯，Cannot assign to a[0]
    
    // 字串拼接
	var b = a + "狗"
	fmt.Printf("b is %s\n", b)

	// 使用內建 len() 函式獲取其長度
	fmt.Printf("a's length is: %d\n", len(a))

	// 使用 for;len 遍歷
	for i := 0; i < len(a); i++ {
		fmt.Println(i, a[i])
	}

	// 使用 for;range 遍歷
	for i, v := range a {
		fmt.Println(i, v)
	}
}


/* output
hi,狗

a[0] is 104
a[0:2] is hi

b is hi,狗狗

a's length is: 6

0 104
1 105
2 44
3 231
4 139
5 151

0 104
1 105
2 44
3 29399
*/

如果讀者在看上面的程式碼時有疑惑，不用著急，下文將會挨個解讀。

只讀

字串常量會在編譯期分配到只讀段，對應資料地址不可寫入，相同的字串常量不會重複儲存

func main() {
	var a = "hello"
	fmt.Println(a, &a, (*reflect.StringHeader)(unsafe.Pointer(&a)))
	a = "world"
	fmt.Println(a, &a, (*reflect.StringHeader)(unsafe.Pointer(&a)))
	var b = "hello"
	fmt.Println(b, &b, (*reflect.StringHeader)(unsafe.Pointer(&b)))
}

/* output
字串字面量 該變數的記憶體地址 底層位元組切片
hello 0xc0000381f0 &{5033779 5}
world 0xc0000381f0 &{5033844 5}
hello 0xc000038220 &{5033779 5}
*/

可以看到 hello 在底層只儲存了一份

for;len 遍歷

go 的原始碼都是 UTF-8 編碼格式的，上例中的”狗“字佔用三個位元組，即 231 139 151（Unicode Character Table），所以上例的執行結果很清楚。

於此同時，也可以將字串轉化為位元組切片

func main() {
	var a = "hi,狗"
	b := []byte(a)
	fmt.Println(b)	// [104 105 44 231 139 151]
}

for;range 遍歷

The Unicode standard uses the term "code point" to refer to the item represented by a single value.

在 Unicode 標準中，使用術語 code point 來表示由單個值表示的項，通俗點來說，U+72D7（十進位制表示為 29399）代表符號 ”狗“

"Code point" is a bit of a mouthful, so Go introduces a shorter term for the concept: rune.

code point 有點拗口，所以在 go 語言中專門有一個術語來代表它，即 rune

注意：rune 其實是 int32 的型別別名

// rune is an alias for int32 and is equivalent to int32 in all ways. It is
// used, by convention, to distinguish character values from integer values.
type rune = int32

在對字串型別進行 for;range 遍歷時，其實是按照 rune 型別來解碼的，所以上例的執行結果也很清晰。

與此同時，也可以將字串轉化為 rune 切片

func main() {
	// 使用字串字面量初始化
	var a = "hi,狗"
	r := []rune(a)
	fmt.Println(r) // [104 105 44 29399]
}

當然我們也可以使用 "unicode/utf8" 標準庫，手動實現 for;range 語法糖相同的效果

func main() {
	var a = "hi,狗"
	for i, w := 0, 0; i < len(a); i += w {
		runeValue, width := utf8.DecodeRuneInString(a[i:])
		fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
		w = width
	}
}

/* output
U+0068 'h' starts at byte position 0
U+0069 'i' starts at byte position 1
U+002C ',' starts at byte position 2
U+72D7 '狗' starts at byte position 3
*/

參考

Strings, bytes, runes and characters in Go

為什麼說go語言中的string是不可變的？

字元咋存？utf8咋編碼？string啥結構？

Go benchmark 一清二楚
2021-02-28
Go
Go 中 type var string 和 type var = string 的區別
2019-07-17
Go
GO 中 string 的實現原理
2021-06-19
Go
Dig101:Go之string那些事
2020-02-06
Go
【Go進階—資料結構】string
2021-10-12
Go資料結構
【Go】string 優化誤區及建議
2019-02-24
Go優化
小白學習Golang（七）Go語言String
2020-09-23
Golang
go裡面如何將[]int json序列化為[]string？
2019-08-09
GoJSON
GO語言中string和[]byte的區別及轉換
2024-09-22
Go
String,String Builder,String Buffer-原始碼
2021-08-03
UI原始碼
Failed to execute user defined function(anonfun$concatStr$1: (map＜string,string＞, string) =＞ string)
2020-10-07
AIFunction
清華尹成帶你實戰GO案例（15)Go String與Byte切片之間的轉換
2018-05-21
Go
String s = “hello“和String s = new String(“hello“)的區別
2024-07-18
String
2024-10-04
rust 中 str 與 String; &str &String
2023-05-16
Rust
【JDK】分析 String str=““ 與 new String()
2021-09-18
JDK
轉換String三種方式比較：toString()、String.valueOf()、(String)
2018-05-11
JavaScript String()
2018-06-01
JavaScript
spring - string
2024-03-06
Spring
String …params
2020-11-11
String模板
2020-11-01
python string
2019-05-18
Python
string 字串
2024-06-07
字串
python string
2021-09-09
Python
Date or String
2022-07-04
String類
2020-12-24
string容器
2020-12-10
String字串
2020-12-24
字串
深入剖析go中字串的編碼問題——特殊字元的string怎麼轉byte？
2020-10-10
Go字串字元
ES 筆記十：Query String & Simple Query String
2019-10-19
筆記
Solidity String轉byte32 byte轉String
2020-12-31
Solid
關於 GO 中 flag.StringVar 或者 flag.String 都獲取不到值的問題
2019-10-05
Go
String s = new String(" a ") 到底產生幾個物件？
2020-04-14
物件
String筆記
2019-02-16
筆記
string轉QBytearray
2019-02-22
[LeetCode] Rotate String
2019-01-19
LeetCode
[LintCode] Permutation in String
2019-01-19
Reverse Vowels of a String
2018-06-03

Go string 一清二楚

前言

是什麼

怎麼用

只讀

for;len 遍歷

for;range 遍歷

參考

相關文章