前幾天在微博上看到一篇文章:價值百萬的 MySQL 的隱式型別轉換感覺寫的很不錯,再加上自己之前也對MySQL的隱式轉化這邊並不是很清楚,所以就順勢整理了一下。希望對大家有所幫助。
當我們對不同型別的值進行比較的時候,為了使得這些數值「可比較」(也可以稱為型別的相容性),MySQL會做一些隱式轉化(Implicit type conversion)。比如下面的例子:
mysql> SELECT 1+'1';
-> 2 mysql> SELECT CONCAT(2,' test');
-> '2 test'
很明顯,上面的SQL語句的執行過程中就出現了隱式轉化。並且從結果們可以判斷出,第一條SQL中,將字串的“1”轉換為數字1,而在第二條的SQL中,將數字2轉換為字串“2”。
MySQL也提供了CAST()函式。我們可以使用它明確的把數值轉換為字串。當使用CONCA()函式的時候,也可能會出現隱式轉化,因為它希望的引數為字串形式,但是如果我們傳遞的不是字串呢:
mysql> SELECT 38.8, CAST(38.8 AS CHAR);
-> 38.8, '38.8' mysql> SELECT 38.8, CONCAT(38.8);
-> 38.8, '38.8'
隱式轉化規則
官方文件中關於隱式轉化的規則是如下描述的:
If one or both arguments are NULL, the result of the comparison is NULL, except for the NULL-safe <=> equality comparison operator. For NULL <=> NULL, the result is true. No conversion is needed.
-
If both arguments in a comparison operation are strings, they are compared as strings.
-
If both arguments are integers, they are compared as integers.
-
Hexadecimal values are treated as binary strings if not compared to a number.
-
If one of the arguments is a TIMESTAMP or DATETIME column and the other argument is a constant, the constant is converted to a timestamp before the comparison is performed. This is done to be more ODBC-friendly. Note that this is not done for the arguments to IN()! To be safe, always use complete datetime, date, or time strings when doing comparisons. For example, to achieve best results when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the desired data type.
A single-row subquery from a table or tables is not considered a constant. For example, if a subquery returns an integer to be compared to a DATETIME value, the comparison is done as two integers. The integer is not converted to a temporal value. To compare the operands as DATETIME values, use CAST() to explicitly convert the subquery value to DATETIME.
-
If one of the arguments is a decimal value, comparison depends on the other argument. The arguments are compared as decimal values if the other argument is a decimal or integer value, or as floating-point values if the other argument is a floating-point value.
-
In all other cases, the arguments are compared as floating-point (real) numbers.
翻譯為中文就是:
- 兩個引數至少有一個是 NULL 時,比較的結果也是 NULL,例外是使用 <=> 對兩個 NULL 做比較時會返回 1,這兩種情況都不需要做型別轉換
- 兩個引數都是字串,會按照字串來比較,不做型別轉換
- 兩個引數都是整數,按照整數來比較,不做型別轉換
- 十六進位制的值和非數字做比較時,會被當做二進位制串
- 有一個引數是 TIMESTAMP 或 DATETIME,並且另外一個引數是常量,常量會被轉換為 timestamp
- 有一個引數是 decimal 型別,如果另外一個引數是 decimal 或者整數,會將整數轉換為 decimal 後進行比較,如果另外一個引數是浮點數,則會把 decimal 轉換為浮點數進行比較
- 所有其他情況下,兩個引數都會被轉換為浮點數再進行比較
注意點
安全問題:假如 password 型別為字串,查詢條件為 int 0 則會匹配上。
mysql> select * from test;
+ | id | name | password |
+ | 1 | test1 | password1 |
| 2 | test2 | password2 |
+ 2 rows in set (0.00 sec)
mysql> select * from test where name = 'test1' and password = 0;
+ | id | name | password |
+ | 1 | test1 | password1 |
+ 1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+ | Level | Code | Message |
+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'password1' |
+ 1 row in set (0.00 sec)
相信上面的例子,一些機靈的同學可以發現其實上面的例子也可以做sql注入。
假設網站的登入那塊做的比較挫,使用下面的方式:
SELECT * FROM users WHERE username = '$_POST["username"]' AND password = '$_POST["password"]'
如果username輸入的是a' OR 1='1,那麼password隨便輸入,這樣就生成了下面的查詢:
SELECT * FROM users WHERE username = 'a' OR 1='1' AND password = 'anyvalue'
就有可能登入系統。其實如果攻擊者看過了這篇文章,那麼就可以利用隱式轉化來進行登入了。如下:
mysql> select * from test;
+ | id | name | password |
+ | 1 | test1 | password1 |
| 2 | test2 | password2 |
| 3 | aaa | aaaa |
| 4 | 55aaa | 55aaaa |
+ 4 rows in set (0.00 sec)
mysql> select * from test where name = 'a' + '55';
+ | id | name | password |
+ | 4 | 55aaa | 55aaaa |
+ 1 row in set, 5 warnings (0.00 sec)
之所以出現上述的原因是因為:
mysql> select '55aaa' = 55;
+ | '55aaa' = 55 |
+ | 1 |
+ 1 row in set, 1 warning (0.00 sec)
mysql> select 'a' + '55';
+ | 'a' + '55' |
+ | 55 |
+ 1 row in set, 1 warning (0.00 sec)
下面透過一些例子來複習一下上面的轉換規則:
mysql> select 1+1;
+ | 1+1 |
+ | 2 |
+ 1 row in set (0.00 sec)
mysql> select 'aa' + 1;
+ | 'aa' + 1 |
+ | 1 |
+ 1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+ | Level | Code | Message |
+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'aa' |
+ 1 row in set (0.00 sec)
把字串“aa”和1進行求和,得到1,因為“aa”和數字1的型別不同,MySQL官方文件告訴我們:
When an operator is used with operands of different types, type conversion occurs to make the operands compatible.
檢視warnings可以看到隱式轉化把字串轉為了double型別。但是因為字串是非數字型的,所以就會被轉換為0,因此最終計算的是0+1=1
上面的例子是型別不同,所以出現了隱式轉化,那麼如果我們使用相同型別的值進行運算呢?
mysql> select 'a' + 'b';
+ | 'a' + 'b' |
+ | 0 |
+ 1 row in set, 2 warnings (0.00 sec)
mysql> show warnings;
+ | Level | Code | Message |
+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'a' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b' |
+ 2 rows in set (0.00 sec)
是不是有點鬱悶呢?
之所以出現這種情況,是因為+為算術運算子arithmetic operator 這樣就可以解釋為什麼a和b都轉換為double了。因為轉換之後其實就是:0+0=0了。
在看一個例子:
mysql> select 'a'+'b'='c';
+ | 'a'+'b'='c' |
+ | 1 |
+ 1 row in set, 3 warnings (0.00 sec)
mysql> show warnings;
+ | Level | Code | Message |
+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'a' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'c' |
+ 3 rows in set (0.00 sec)
現在就看也很好的理解上面的例子了吧。a+b=c結果為1,1在MySQL中可以理解為TRUE,因為'a'+'b'的結果為0,c也會隱式轉化為0,因此比較其實是:0=0也就是true,也就是1.
第二個需要注意點就是防止多查詢或者刪除資料
mysql> select * from test;
+ | id | name | password |
+ | 1 | test1 | password1 |
| 2 | test2 | password2 |
| 3 | aaa | aaaa |
| 4 | 55aaa | 55aaaa |
| 5 | 1212 | aaa |
| 6 | 1212a | aaa |
+ 6 rows in set (0.00 sec)
mysql> select * from test where name = 1212;
+ | id | name | password |
+ | 5 | 1212 | aaa |
| 6 | 1212a | aaa |
+ 2 rows in set, 5 warnings (0.00 sec)
mysql> select * from test where name = '1212';
+ | id | name | password |
+ | 5 | 1212 | aaa |
+ 1 row in set (0.00 sec)
上面的例子本意是查詢id為5的那一條記錄,結果把id為6的那一條也查詢出來了。我想說明什麼情況呢?有時候我們的資料庫表中的一些列是varchar型別,但是儲存的值為‘1123’這種的純數字的字串值,一些同學寫sql的時候又不習慣加引號。這樣當進行select,update或者delete的時候就可能會多操作一些資料。所以應該加引號的地方別忘記了。
關於字串轉數字的一些說明
mysql> select 'a' = 0;
+ | 'a' = 0 |
+ | 1 |
+ 1 row in set, 1 warning (0.00 sec)
mysql> select '1a' = 1;
+ | '1a' = 1 |
+ | 1 |
+ 1 row in set, 1 warning (0.00 sec)
mysql> select '1a1b' = 1;
+ | '1a1b' = 1 |
+ | 1 |
+ 1 row in set, 1 warning (0.00 sec)
mysql> select '1a2b3' = 1;
+ | '1a2b3' = 1 |
+ | 1 |
+ 1 row in set, 1 warning (0.00 sec)
mysql> select 'a1b2c3' = 0;
+ | 'a1b2c3' = 0 |
+ | 1 |
+ 1 row in set, 1 warning (0.00 sec)
從上面的例子可以看出,當把字串轉為數字的時候,其實是從左邊開始處理的。
- 如果字串的第一個字元就是非數字的字元,那麼轉換為數字就是0
- 如果字串以數字開頭
- 如果字串中都是數字,那麼轉換為數字就是整個字串對應的數字
- 如果字串中存在非數字,那麼轉換為的數字就是開頭的那些數字對應的值
如果你有其他更好的例子,或者被隱式轉化坑過的情況,歡迎分享。
參考資料
1. 隱式型別轉換例項
今天生產庫上突然出現MySQL執行緒數告警,IOPS很高,例項會話裡面出現許多類似下面的sql:(修改了相關欄位和值)
SELECT f_col3_id,f_qq1_id FROM d_dbname.t_tb1 WHERE f_col1_id=1226391 and f_col2_id=1244378 and
f_qq1_id in (12345,23456,34567,45678,56789,67890,78901,89012,90123,901231,901232,901233)
mysql>explain extended SELECT f_col3_id,f_qq1_id FROM d_dbname.t_tb1 use index(idx_corpid_qq1id) WHERE f_col1_id=1226391 and f_col2_id=1244378 and f_qq1_id in (12345,23456,34567,45678,56789,67890,78901,89012,90123,901231,901232,901233)
用 explain 看了下掃描行數和索引選擇情況:
mysql>explain SELECT f_col3_id,f_qq1_id FROM d_dbname.t_tb1 WHERE f_col1_id=1226391
and f_col2_id=1244378 and f_qq1_id in (12345,23456,34567,45678,56789,67890,78901,89012,90123,901231,901232,901233);
+------+---------------+---------+--------+--------------------------------+---------------+------------+--------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+---------------+---------+--------+--------------------------------+---------------+------------+--------+--------+------------------------------------+
| 1 | SIMPLE | t_tb1 | ref | uid_type_frid,idx_corpid_qq1id | uid_type_frid | 8 | const | 1386 | Using index condition; Using where |
+------+---------------+---------+--------+--------------------------------+---------------+------------+--------+--------+------------------------------------+
共返回 1 行記錄,花費 11.52 ms.
t_tb1 表上有個索引uid_type_frid(f_col2_id,f_type)、idx_corp_id_qq1id(f_col1_id,f_qq1_id),而且如果選擇後者時,f_qq1_id的過濾效果應該很佳,但卻選擇了前者。當使用 hint use index(idx_corp_id_qq1id)時:
+------+---------------+--------+--------+---------------------+------------------+------------+----------+-------------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+---------------+--------+--------+---------------------+------------------+------------+----------+-------------+------------------------------------+
| 1 | SIMPLE | t_tb1 | ref | idx_corpid_qq1id | idx_corpid_qq1id | 8 | const | 2375752 | Using index condition
+---- -+---------------+--------+--------+---------------------+------------------+------------+----------+-------------+------------------------------------+
共返回 1 行記錄,花費 17.48 ms.
mysql>show warnings
+-----------------+----------------+-----------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-----------------+----------------+-----------------------------------------------------------------------------------------------------------------------+
| Warning | 1739 | Cannot use range access on index 'idx_corpid_qq1id' due to type or collation conversion on field 'f_qq1_id' |
| Note | 1003 | /* select#1 */ select `d_dbname`.`t_tb1`.`f_col3_id` AS `f_col3_id`,`d_dbname`.`t_tb1`.`f_qq1_id` AS `f_qq1_id` from `d_dbname`.`t_tb1` USE INDEX (`idx_corpid_qq1id`) where |
| | | ((`d_dbname`.`t_tb1`.`f_col2_id` = 1244378) and (`d_dbname`.`t_tb1`.`f_col1_id` = 1226391) and (`d_dbname`.`t_tb1`.`f_qq1_id` in |
| | | (12345,23456,34567,45678,56789,67890,78901,89012,90123,901231,901232,901233))) |
+-----------------+----------------+-----------------------------------------------------------------------------------------------------------------------+
共返回 2 行記錄,花費 10.81 ms.
rows列達到200w行,但問題也發現了:select_type應該是 range 才對,key_len看出來只用到了idx_corpid_qq1id索引的第一列。上面explain使用了 extended,所以show warnings;可以很明確的看到 f_qq1_id 出現了隱式型別轉換:f_qq1_id是varchar,而後面的比較值是整型。
解決該問題就是避免出現隱式型別轉換(implicit type conversion)帶來的不可控:把f_qq1_id in的內容寫成字串:
mysql>explain SELECT f_col3_id,f_qq1_id FROM d_dbname.t_tb1 WHERE f_col1_id=1226391 and f_col2_id=1244378 and
f_qq1_id in ('12345','23456','34567','45678','56789','67890','78901','89012','90123','901231');
+-------+---------------+--------+---------+--------------------------------+------------------+-------------+---------+---------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+---------------+--------+---------+--------------------------------+------------------+-------------+---------+---------+------------------------------------+
| 1 | SIMPLE | t_tb1 | range | uid_type_frid,idx_corpid_qq1id | idx_corpid_qq1id | 70 | | 40 | Using index condition; Using where |
+-------+---------------+--------+---------+--------------------------------+------------------+-------------+---------+---------+------------------------------------+
共返回 1 行記錄,花費 12.41 ms.
類似的還出現過一例:掃描行數從1386減少為40。
SELECT count(0) FROM d_dbname.t_tb2 where f_col1_id= '1931231' AND f_phone in(098890);
| Warning | 1292 | Truncated incorrect DOUBLE value: '1512-98464356'
借這個機會,系統的來看一下mysql中的隱式型別轉換。最佳化後直接從掃描rows 100w行降為1。
2. mysql隱式轉換規則
2.1 規則
下面來分析一下隱式轉換的規則:
a. 兩個引數至少有一個是 NULL 時,比較的結果也是 NULL,例外是使用 <=> 對兩個 NULL 做比較時會返回 1,這兩種情況都不需要做型別轉換
b. 兩個引數都是字串,會按照字串來比較,不做型別轉換
c. 兩個引數都是整數,按照整數來比較,不做型別轉換
d. 十六進位制的值和非數字做比較時,會被當做二進位制串
e. 有一個引數是 TIMESTAMP 或 DATETIME,並且另外一個引數是常量,常量會被轉換為 timestamp
f. 有一個引數是 decimal 型別,如果另外一個引數是 decimal 或者整數,會將整數轉換為 decimal 後進行比較,如果另外一個引數是浮點數,則會把 decimal 轉換為浮點數進行比較
g. 所有其他情況下,兩個引數都會被轉換為浮點數再進行比較
mysql> select 11 + '11', 11 + 'aa', 'a1' + 'bb', 11 + '0.01a';
+-----------+-----------+-------------+--------------+
| 11 + '11' | 11 + 'aa' | 'a1' + 'bb' | 11 + '0.01a' |
+-----------+-----------+-------------+--------------+
| 22 | 11 | 0 | 11.01 |
+-----------+-----------+-------------+--------------+
1 row in set, 4 warnings (0.00 sec)
mysql> show warnings;
+---------+------+-------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: 'aa' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a1' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'bb' |
| Warning | 1292 | Truncated incorrect DOUBLE value: '0.01a' |
+---------+------+-------------------------------------------+
4 rows in set (0.00 sec)
mysql> select '11a' = 11, '11.0' = 11, '11.0' = '11', NULL = 1;
+------------+-------------+---------------+----------+
| '11a' = 11 | '11.0' = 11 | '11.0' = '11' | NULL = 1 |
+------------+-------------+---------------+----------+
| 1 | 1 | 0 | NULL |
+------------+-------------+---------------+----------+
1 row in set, 1 warning (0.01 sec)
0.01a轉成double型也是被截斷成0.01,所以11 + '0.01a' = 11.01。上面可以看出11 + 'aa',由於運算子兩邊的型別不一樣且符合第g條,aa要被轉換成浮點型小數,然而轉換失敗(字母被截斷),可以認為轉成了 0,整數11被轉成浮點型還是它自己,所以11 + 'aa' = 11。
等式比較也說明了這一點,'11a'和'11.0'轉換後都等於 11,這也正是文章開頭例項為什麼沒走索引的原因: varchar型的f_qq1_id,轉換成浮點型比較時,等於 12345 的情況有無數種如12345a、12345.b等待,MySQL最佳化器無法確定索引是否更有效,所以選擇了其它方案。
但並不是只要出現隱式型別轉換,就會引起上面類似的效能問題,最終是要看轉換後能否有效選擇索引。像f_id = '654321'、f_mtime between '2016-05-01 00:00:00' and '2016-05-04 23:59:59'就不會影響索引選擇,因為前者f_id是整型,即使與後面的字串型數字轉換成double比較,依然能根據double確定f_id的值,索引依然有效。後者是因為符合第e條,只是右邊的常量做了轉換。
開發人員可能都只要存在這麼一個隱式型別轉換的坑,但卻又經常不注意,所以乾脆無需記住那麼多規則,該什麼型別就與什麼型別比較。
2.2 隱式型別轉換的安全問題
implicit type conversion 不僅可能引起效能問題,還有可能產生安全問題。
mysql> desc t_account;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| fid | int(11) | NO | PRI | NULL | auto_increment |
| fname | varchar(20) | YES | | NULL | |
| fpassword | varchar(50) | YES | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
mysql> select * from t_account;
+-----+-----------+-------------+
| fid | fname | fpassword |
+-----+-----------+-------------+
| 1 | xiaoming | p_xiaoming |
| 2 | xiaoming1 | p_xiaoming1 |
+-----+-----------+-------------+
假如應用前端沒有WAF防護,那麼下面的sql很容易注入:
mysql> select * from t_account where fname='A' ;
fname傳入 A' OR 1='1
mysql> select * from t_account where fname='A' OR 1='1';
攻擊者更聰明一點: fname傳入 A'+'B ,fpassword傳入 ccc'+0 :
mysql> select * from t_account where fname='A'+'B' and fpassword='ccc'+0;
+-----+-----------+-------------+
| fid | fname | fpassword |
+-----+-----------+-------------+
| 1 | xiaoming | p_xiaoming |
| 2 | xiaoming1 | p_xiaoming1 |
+-----+-----------+-------------+
2 rows in set, 7 warnings (0.00 sec)
參考
原文連結地址: