作者:劉晨
網名 bisal ,具有十年以上的應用運維工作經驗,目前主要從事資料庫應用研發能力提升和技術管理相關的工作,Oracle ACE ,騰訊雲TVP,擁有 Oracle OCM & OCP 、EXIN DevOps Master 、SCJP 等國際認證,國內首批 Oracle YEP 成員,OCMU 成員,《DevOps 最佳實踐》中文譯者之一,CSDN & ITPub 專家博主,公眾號"bisal的個人雜貨鋪",長期堅持分享技術文章,多次線上上和線下分享技術主題。
本文來源:原創投稿
*愛可生開源社群出品,原創內容未經授權不得隨意使用,轉載請聯絡小編並註明來源。
同事提了一個MySQL資料匯入的問題,使用load data將本地檔案(.csv)匯入資料庫表的時候,提示這個錯誤,
| Warning | 1265 | Data truncated for column 'c1' at row 1 |
為了能更好的說明問題,遮蔽業務資訊,我們來構造測試資料模擬。
一、準備工作
(1)csv測試檔案,如下所示,簡化了原始檔案,包含兩個日期型別的資料,和一個字串型別的資料,
cat online.csv
"2022-01-01 00:00:00","A","2022-02-01 00:00:00"
"2022-01-02 00:00:00","B","2022-02-02 00:00:00"
"2022-01-03 00:00:00","C","2022-02-03 00:00:00"
"2022-01-04 00:00:00","D","2022-02-04 00:00:00"
"2022-01-05 00:00:00","E","2022-02-05 00:00:00"
我們知道,csv 檔案可以用 excel 開啟,如下所示,
(2)測試表結構,如下所示,
bisal@mysqldb 18:21: [test]> desc t;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | bigint | NO | PRI | NULL | auto_increment |
| c1 | datetime | YES | | NULL | |
| c2 | varchar(10) | YES | | NULL | |
| c3 | datetime | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
(3)原始執行的匯入指令是,
load data local infile '/home/mysql/online.csv'
into table test fields terminated by ',' lines terminated by '\n'
(c1, c2, c3)
set c1=date_format(@c1, '%Y-%m-%d %H:%i:%s'),
c3=date_format(@c3, '%Y-%m-%d %H:%i:%s');
P. S. 如果執行出現這個錯誤,
ERROR 3948 (42000): Loading local data is disabled; this must be enabled on both the client and server sides
可以看下local_infile變數值,
bisal@mysqldb 18:23: [test]> show global variables like 'local_infile';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| local_infile | OFF |
+---------------+-------+
1 row in set (0.01 sec)
如果是OFF,需要改為ON,
bisal@mysqldb 18:23: [test]> set global local_infile=1;
Query OK, 0 rows affected (0.00 sec)
再次執行匯入,如果報這個錯,
ERROR 2068 (HY000): LOAD DATA LOCAL INFILE file request rejected due to restrictions on access.
可以在客戶端登入的時候,加上引數--local-infile=1,
mysql -ubisal -pbisal --local-infile=1
二、實際測試
(1)第一次嘗試
如下匯入指令,提示了1265的錯誤,
bisal@mysqldb 18:50: [test]> load data local infile '/home/mysql/online.csv' into table t fields
-> terminated by ',' lines terminated by '\n'
-> (c1, c2, c3) set c1=date_format(@c1, '%Y-%m-%d %H:%i:%s'), c3=date_format(@c3, '%Y-%m-%d %H:%i:%s');
Query OK, 5 rows affected, 10 warnings (0.01 sec)
Records: 5 Deleted: 0 Skipped: 0 Warnings: 10
bisal@mysqldb 18:51: [test]> show warnings;
+---------+------+-----------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------+
| Warning | 1265 | Data truncated for column 'c1' at row 1 |
| Warning | 1265 | Data truncated for column 'c3' at row 1 |
| Warning | 1265 | Data truncated for column 'c1' at row 2 |
| Warning | 1265 | Data truncated for column 'c3' at row 2 |
| Warning | 1265 | Data truncated for column 'c1' at row 3 |
| Warning | 1265 | Data truncated for column 'c3' at row 3 |
| Warning | 1265 | Data truncated for column 'c1' at row 4 |
| Warning | 1265 | Data truncated for column 'c3' at row 4 |
| Warning | 1265 | Data truncated for column 'c1' at row 5 |
| Warning | 1265 | Data truncated for column 'c3' at row 5 |
+---------+------+-----------------------------------------+
10 rows in set (0.00 sec)
檢索資料,雖然匯入了檔案中的5條記錄,但日期欄位,都給截斷了,儲存的是空值,
(2)第二次嘗試
從(1)的指令看,要將檔案online.csv的資料,按照","分隔,匯入t表的欄位中,其中c1和c3是datetime日期型別的,而且load data指令中使用了set,需要對資料進行二次處理,按照日期格式,進行轉換儲存,
c1=date_format(@c1,'%Y-%m-%d%H:%i:%s')
c3=date_format(@c3,'%Y-%m-%d%H:%i:%s'),
因此,load data指令中()括號內的欄位,應該使用變數,
load data local infile '/home/mysql/online.csv' into table t fields
terminated by ',' lines terminated by '\n'
(@c1, c2, @c3)
set c1=date_format(@c1, '%Y-%m-%d %H:%i:%s'),
c3=date_format(@c3, '%Y-%m-%d %H:%i:%s');
修改指令,再次執行匯入,還是報錯了,但這次是1292的錯誤,根據報錯的資料,檔案中的第一個欄位日期多了雙引號,第三個欄位日期則多了雙引號和"\x0D"(瞭解ASCII的同學,肯定知道這是回車的意思),
bisal@mysqldb 18:51: [test]> load data local infile '/home/mysql/online.csv' into table t fields
-> terminated by ',' lines terminated by '\n'
-> (@c1, c2, @c3) set c1=date_format(@c1, '%Y-%m-%d %H:%i:%s'), c3=date_format(@c3, '%Y-%m-%d %H:%i:%s');
Query OK, 5 rows affected, 10 warnings (0.02 sec)
Records: 5 Deleted: 0 Skipped: 0 Warnings: 10
bisal@mysqldb 18:51: [test]> show warnings;
+---------+------+-------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------+
| Warning | 1292 | Incorrect datetime value: '"2022-01-01 00:00:00"' |
| Warning | 1292 | Incorrect datetime value: '"2022-02-01 00:00:00"\x0D' |
| Warning | 1292 | Incorrect datetime value: '"2022-01-02 00:00:00"' |
| Warning | 1292 | Incorrect datetime value: '"2022-02-02 00:00:00"\x0D' |
| Warning | 1292 | Incorrect datetime value: '"2022-01-03 00:00:00"' |
| Warning | 1292 | Incorrect datetime value: '"2022-02-03 00:00:00"\x0D' |
| Warning | 1292 | Incorrect datetime value: '"2022-01-04 00:00:00"' |
| Warning | 1292 | Incorrect datetime value: '"2022-02-04 00:00:00"\x0D' |
| Warning | 1292 | Incorrect datetime value: '"2022-01-05 00:00:00"' |
| Warning | 1292 | Incorrect datetime value: '"2022-02-05 00:00:00"\x0D' |
+---------+------+-------------------------------------------------------+
10 rows in set (0.00 sec)
當然這兩個日期欄位還是被截斷的,
(3)第三次嘗試
我們先解決雙引號的問題,向指令中增加 enclosed by '"' ,目的是刪除包裹的雙引號,
bisal@mysqldb 18:52: [test]> load data local infile '/home/mysql/online.csv' into table t fields
-> terminated by ',' enclosed by '"' lines terminated by '\n'
-> (@c1, c2, @c3) set c1=date_format(@c1, '%Y-%m-%d %H:%i:%s'), c3=date_format(@c3, '%Y-%m-%d %H:%i:%s');
Query OK, 3 rows affected, 5 warnings (0.01 sec)
Records: 3 Deleted: 0 Skipped: 0 Warnings: 5
bisal@mysqldb 18:52: [test]> show warnings;
+---------+------+----------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------------------------------------------------+
| Warning | 1292 | Truncated incorrect datetime value: '2022-02-01 00:00:00"
"2022-01-02 00:00:00' |
| Warning | 1262 | Row 1 was truncated; it contained more data than there were input columns |
| Warning | 1292 | Truncated incorrect datetime value: '2022-02-03 00:00:00"
"2022-01-04 00:00:00' |
| Warning | 1262 | Row 2 was truncated; it contained more data than there were input columns |
| Warning | 1292 | Incorrect datetime value: '"2022-02-05 00:00:00"
' |
+---------+------+----------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
可以看到,有兩行資料正確插入了,但是另外三行存在錯誤,要麼是第三個欄位被截斷了,要麼就是沒插入進來,要解決回車問題,
(4)第四次嘗試
除了回車問題,我們可以看到,此處用的是date_format函式,但實際上從檔案中讀到的是字串,因此可改為str_to_date函式,格式相同,
bisal@mysqldb 18:53: [test]> load data local infile '/home/mysql/online.csv' into table t fields
-> terminated by ',' enclosed by '"' lines terminated by '\n'
-> (@c1, c2, @c3) set c1=str_to_date(@c1, '%Y-%m-%d %H:%i:%s'), c3=str_to_date(@c3, '%Y-%m-%d %H:%i:%s');
Query OK, 3 rows affected, 5 warnings (0.01 sec)
Records: 3 Deleted: 0 Skipped: 0 Warnings: 5
bisal@mysqldb 18:53: [test]> show warnings;
+---------+------+----------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------------------------------------------------+
| Warning | 1292 | Truncated incorrect datetime value: '2022-02-01 00:00:00"
"2022-01-02 00:00:00' |
| Warning | 1262 | Row 1 was truncated; it contained more data than there were input columns |
| Warning | 1292 | Truncated incorrect datetime value: '2022-02-03 00:00:00"
"2022-01-04 00:00:00' |
| Warning | 1262 | Row 2 was truncated; it contained more data than there were input columns |
| Warning | 1411 | Incorrect datetime value: '"2022-02-05 00:00:00"
' for function str_to_date |
+---------+------+----------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
資料還是沒都插入成功,
(5)第五次嘗試
我們這次解決回車的問題,lines terminated by增加\r,執行指令,正常,
bisal@mysqldb 18:53: [test]> load data local infile '/home/mysql/online.csv' into table t fields
-> terminated by ',' enclosed by '"' lines terminated by '\r\n'
-> (@c1, c2, @c3) set c1=str_to_date(@c1, '%Y-%m-%d %H:%i:%s'), c3=str_to_date(@c3, '%Y-%m-%d %H:%i:%s');
Query OK, 5 rows affected (0.01 sec)
Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
驗證一下資料,都正常插入了,
針對這個問題,從表象上是load data指令各種引數的用法,但實際上考查了很多內容,有技術上的,有問題排查思路的,
(1)是否瞭解load data不讓執行的原因,需要做什麼配置改動。
(2)是否瞭解load data中各個引數的含義,terminated by、enclosed by、lines terminated by,包括()括號內用的是欄位名稱還是@變數名稱,set二次處理資料應該怎麼用。
(3)是否瞭解csv用txt和excel開啟顯示不同?用txt開啟,會看到每個欄位有雙引號,逗號分隔,而excel開啟,則是正常的單元格顯示,會產生誤解。
(4)是否瞭解date_format函式和str_to_date函式有什麼不同?
(5)MySQL中執行各種指令報錯,透過show warnings顯示的錯誤資訊中,能看到些端倪,例如Truncated incorrect date time value:'2022-02-0100:00:00",可以知道是檔案中的第一個欄位還是第三個欄位存在問題,當然因為這是為了做測試,特意將資料做成有規律的,如果是生產實際執行的,不一定能很有規律的呈現出來,這就要更仔細地理解這些錯誤提示,從中找到線索,往往這種能力,除了具備天賦,是碰到更多的問題鍛煉出來的,這就要關注平時的積累。