一 引子
在生產環境中,經常會有這樣的場景:獲得中文資料。那問題就來了,怎麼才能匹配出中文字元呢?
本文提供兩種方法。
二 演示
2.1 環境
1 2 3 4 5 6 7 8 9 10 11 |
mysql> SHOW VARIABLES LIKE "%version%"; +-------------------------+------------------------------+ | Variable_name | Value | +-------------------------+------------------------------+ | protocol_version | 10 | | version | 5.1.73 | | version_comment | MySQL Community Server (GPL) | | version_compile_machine | i386 | | version_compile_os | apple-darwin10.3.0 | +-------------------------+------------------------------+ 5 rows in set (0.00 sec) |
2.2 建立測試表和插入測試資料
1 |
mysql -S /tmp/mysql_5173.sock -uroot -proot |
建立測試表和插入測試資料。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
mysql> USE test; Database changed mysql> CREATE TABLE user -> (name VARCHAR(20) -> ) DEFAULT CHARSET = utf8 ENGINE = INNODB; Query OK, 0 rows affected (0.10 sec) mysql> SHOW TABLE STATUS LIKE 'user' G; *************************** 1. row *************************** Name: user Engine: InnoDB Version: 10 Row_format: Compact Rows: 2 Avg_row_length: 8192 Data_length: 16384 Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: 2015-01-16 18:01:36 Update_time: NULL Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec) ERROR: No query specified mysql> INSERT INTO user VALUES('robin'); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO user VALUES('溫國兵'); Query OK, 1 row affected (0.00 sec) |
三 實現
3.1 方法一 正規表示式
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
mysql> SELECT * FROM user G; *************************** 1. row *************************** name: robin *************************** 2. row *************************** name: 溫國兵 2 rows in set (0.00 sec) mysql> SELECT name, -> CASE name REGEXP "[u0391-uFFE5]" -> WHEN 1 THEN "不是中文字元" -> ELSE "是中文字元" -> END AS "判斷是否是中文字元" -> FROM user; +-----------+-----------------------------+ | name | 判斷是否是中文字元 | +-----------+-----------------------------+ | robin | 不是中文字元 | | 溫國兵 | 是中文字元 | +-----------+-----------------------------+ 2 rows in set (0.00 sec) mysql> SELECT name FROM user WHERE NOT (name REGEXP "[u0391-uFFE5]"); +-----------+ | name | +-----------+ | 溫國兵 | +-----------+ 1 row in set (0.00 sec) |
3.2 方法二 length() 和 char_length()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
mysql> SELECT name, length(name), char_length(name) FROM user; +-----------+--------------+-------------------+ | name | length(name) | char_length(name) | +-----------+--------------+-------------------+ | robin | 5 | 5 | | 溫國兵 | 20 | 9 | +-----------+--------------+-------------------+ 2 rows in set (0.00 sec) mysql> SELECT name FROM user WHERE length(name) char_length(name); +-----------+ | name | +-----------+ | 溫國兵 | +-----------+ 1 row in set (0.00 sec) |
四 總結
方法一中,[u0391-uFFE5]
匹配中文以外的字元。
方法二中,當字符集為UTF-8,並且字元為中文時,length()
和 char_length()
兩個方法返回的結果不相同。
參考官方文件:
LENGTH()
Return the length of a string in bytes
Returns the length of the string str, measured in bytes. A multibyte character counts as multiple bytes. This means that for a string containing five 2-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5.
CHAR_LENGTH()
Return number of characters in argument
Returns the length of the string str, measured in characters. A multibyte character counts as a single character. This means that for a string containing five 2-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5.