【MySQL】load data語句詳解(二)

沃趣科技發表於2017-05-23
作者:羅小波
沃趣科技高階MySQL資料庫工程師

1.2.6. FIELDS(與COLUMNS關鍵字相同)和LINES子句
  • 以下示例中的char代表單個字元,string代表字串(即多個字元),load data語句中,跳脫字元和欄位引用符只能使用單個字元,欄位分隔符、行分隔符、行字首字元都可以使用多個字元(字串)
  • 對於LOAD DATA INFILE和SELECT … INTO OUTFILE語句中,FIELDS和LINES子句的語法完全相同。兩個子句在LOAD DATA INFILE和SELECT … INTO OUTFILE語句中都是可選的,但如果兩個子句都被指定,則FIELDS必須在LINES之前,否則報語法錯誤

    • FIELDS關鍵字共有三個子句,TERMINATED BY 'string'指定欄位分隔符,[OPTIONALLY] ENCLOSED BY 'char'指定欄位引用符(其中使用了OPTIONALLY關鍵字之後,只在char、varchar和text等字元型欄位上加欄位引用符,數值型的不會加欄位引用符,且OPTIONALLY 關鍵字只在匯出資料時才起作用,匯入資料時用於不用對資料沒有影響 ),ESCAPED BY 'char'指定轉義符,如果您指定了一個FIELDS子句,則它的每個子句也是可選的,但在你指定了FIELDS關鍵字之後,這個關鍵字的子句至少需要指定一個,後續章節會進行舉例說明
    • LINES關鍵字共有兩個子句,STARTING BY 'string'指定行字首字元,TERMINATED BY 'string'指定行分隔符(換行符),如果你指定了LINES關鍵字,則LINES的子句都是可選的,但在你指定了LINES關鍵字之後,這個關鍵字的子句至少需要指定一個,後續章節會進行舉例說明
    • 如果在匯入和匯出時沒有指定FIELDS和LINES子句,則匯入和匯出時兩個子句的預設值相同,預設的欄位分隔符為\t,行分隔符為\n(win上預設為\r\n,記事本程式上預設為\r),欄位引用符為空,行字首字元為空
  • 當mysql server匯出文字資料到檔案時,FIELDS和LINES預設值時SELECT … INTO OUTFILE在輸出文字資料時行為如下:

    • 在文字資料各欄位之間使用製表符來作為欄位分隔符
    • 不使用任何引號來包圍文字資料的各欄位值,即欄位引用符為空
    • 使用\轉義在欄位值中出現的製表符\t,換行符\n或轉義符本身\等特殊字元(即輸出的文字資料中對這些特殊字元前多加一個反斜槓)
    • 在行尾寫上換行符\n,即使用\n作為行分隔符(換行符)
    • 注意:如果您在Windows系統上生成了文字檔案,則可能必須使用LINES TERMINATED BY '\r\n'來正確讀取檔案,因為Windows程式通常使用兩個字元作為行終止符。某些程式(如寫字板)在寫入檔案時可能會使用\r作為行終止符(要讀取這些檔案,請使用LINES TERMINATED BY '\r')
    • FIELDS和LINES子句預設值時生成的純文字資料檔案可以使用python程式碼來讀取檔案檢視檔案中的製表符和換行符(linux下的cat和vim等編輯器預設會解析\t為製表符,\n為換行符,所以使用這些命令可能無法看到這些特殊符號)
      1. >>> f = open('/tmp/test3.txt','r')
      2. >>> data = f.readlines()
      3. >>> data
      4. ['2,"a string","100.20"\n', '4,"a string containing a , comma","102.20"\n', '6,"a string containing a \\" quote","102.20"\n', '8,"a string containing a \\", quote and comma","102.20"\n']
      5. >>> for i in data:
      6. ... print i,
      7. ...
      8. 2,"a string","100.20"
      9. 4,"a string containing a , comma","102.20"
      10. 6,"a string containing a \" quote","102.20"
      11. 8,"a string containing a \", quote and comma","102.20"
  • 當mysql server從文字檔案讀取資料時,FIELDS和LINES預設值會導致LOAD DATA INFILE的行為如下:
    • 尋找換行邊界字元\n來進行換行
    • 不跳過行字首,把行字首也當作資料(發生在如果匯出資料時使用了行字首,匯入時沒有指定正確的行字首或者根本沒有指定行字首選項時)
    • 使用製表符\t來分割一行資料中的各列
    • 要注意:在FIELDS和LINES的預設值下,在解析文字檔案時不會把字串之間的引號當作真正的引號,而是當作資料
1.2.6.1. FIELDS關鍵字及其子句詳解
  • 欄位分隔符,預設是\t,使用子句 fields terminated by 'string' 指定,其中string代表指定的欄位分隔符
    1. admin@localhost : xiaoboluo 03:08:34> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',';
    2. Query OK, 4 rows affected (0.00 sec)
    3. admin@localhost : xiaoboluo 03:08:37> system cat /tmp/test3.txt
    4. 2,a string,100.20
    5. 4,a string containing a \, comma,102.20
    6. 6,a string containing a " quote,102.20
    7. 8,a string containing a "\, quote and comma,102.20
  • 欄位引用符,如果加optionally選項則只用在char、varchar和text等字元型欄位上,數值型別會忽略使用引用符,如果不指定該子句,則預設不使用引用符,使用子句fields [optionally] enclosed by 'char'指定,其中char代表指定的欄位引用符
    1. # 指定欄位引用符為",不使用optionally關鍵字
    2. admin@localhost : xiaoboluo 03:33:33> system rm -f /tmp/test3.txt;
    3. admin@localhost : xiaoboluo 03:37:21> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '"';
    4. Query OK, 5 rows affected (0.00 sec)
    5. admin@localhost : xiaoboluo 03:37:33> system cat /tmp/test3.txt
    6. "2" "a string" "100.20"
    7. "4" "a string containing a , comma" "102.20"
    8. "6" "a string containing a \" quote" "102.20"
    9. "8" "a string containing a \", quote and comma" "102.20"
    10. "10" "\\t" "102.20"
    11. # 指定欄位引用符為",使用optionally關鍵字,可以看到id列的欄位引用符去掉了
    12. admin@localhost : xiaoboluo 03:37:41> system rm -f /tmp/test3.txt;
    13. admin@localhost : xiaoboluo 03:40:53> select * from test3 into outfile "/tmp/test3.txt" FIELDS optionally ENCLOSED BY '"';
    14. Query OK, 5 rows affected (0.00 sec)
    15. admin@localhost : xiaoboluo 03:41:03> system cat /tmp/test3.txt
    16. 2 "a string" "100.20"
    17. 4 "a string containing a , comma" "102.20"
    18. 6 "a string containing a \" quote" "102.20"
    19. 8 "a string containing a \", quote and comma" "102.20"
    20. 10 "\\t" "102.20
  • 跳脫字元,預設為\,使用子句fields escaped by 'char' 指定,其中char代表指定的跳脫字元
    1. admin@localhost : xiaoboluo 03:42:41> system rm -f /tmp/test3.txt;
    2. admin@localhost : xiaoboluo 03:44:18> select * from test3 into outfile "/tmp/test3.txt" fields escaped by '.';
    3. Query OK, 5 rows affected (0.00 sec)
    4. admin@localhost : xiaoboluo 03:44:25> system cat /tmp/test3.txt # 可以看到資料中指定的轉義符.號被轉義了,而資料\t沒有被轉義
    5. 2 a string 100..20
    6. 4 a string containing a , comma 102..20
    7. 6 a string containing a " quote 102..20
    8. 8 a string containing a ", quote and comma 102..20
    9. 10 \t 102..20
    10. admin@localhost : xiaoboluo 03:44:28> truncate test3; #清空表
    11. Query OK, 0 rows affected (0.01 sec)
    12. admin@localhost : xiaoboluo 03:45:19> load data infile "/tmp/test3.txt" into table test3 fields escaped by '.'; #匯入資料時指定轉義符為.
    13. Query OK, 5 rows affected (0.00 sec)
    14. Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
    15. admin@localhost : xiaoboluo 03:45:40> select * from test3; #校驗資料,可以看到匯入資料正常
    16. +----+------------------------------------------+--------+
    17. | id | test | test2 |
    18. +----+------------------------------------------+--------+
    19. | 2 | a string | 100.20 |
    20. | 4 | a string containing a , comma | 102.20 |
    21. | 6 | a string containing a " quote | 102.20 |
    22. | 8 | a string containing a ", quote and comma | 102.20 |
    23. | 10 | \t | 102.20 |
    24. +----+------------------------------------------+--------+
    25. 5 rows in set (0.00 sec)
1.2.6.2. LINES 關鍵字及其子句詳解
  • 行字首字串,使用子句lines starting by 'string' 指定,其中string代表指定的行字首字串,行字首字串在匯出文字資料時使用該子句指定,在匯入文字時在一行資料中如果發現了行字首字串,則只匯入從字首字串開始之後的資料部分,字首字元本身及其之前的資料被忽略掉,如果某行資料不包含行字首字串,則整行資料都會被忽略

如果您想要讀取的純文字檔案中所有行都有一個您想要忽略的公用字首,則可以使用LINES STARTING BY'prefix_string'來跳過這個字首,以及字首字元前面的任何內容。如果某行資料不包含字首字元,則跳過整行內容,例

    1. # load data語句如下
    2. admin@localhost : xiaoboluo 03:48:04> system rm -f /tmp/test3.txt;
    3. admin@localhost : xiaoboluo 03:54:54> select * from test3 into outfile "/tmp/test3.txt" LINES STARTING BY 'xxx';
    4. Query OK, 5 rows affected (0.00 sec)
    5. admin@localhost : xiaoboluo 03:55:03> system cat /tmp/test3.txt #可以看到每行資料前面多了個行字首字串xxx
    6. xxx2 a string 100.20
    7. xxx4 a string containing a , comma 102.20
    8. xxx6 a string containing a " quote 102.20
    9. xxx8 a string containing a ", quote and comma 102.20
    10. xxx10 \\t 102.20
    11. # 現在,到shell命令列去修改一下,增加兩行
    12. admin@localhost : xiaoboluo 03:55:50> system cat /tmp/test3.txt # 最後要載入的純文字資料內容如下
    13. xxx2 a string 100.20
    14. xxx4 a string containing a , comma 102.20
    15. xxx6 a string containing a " quote 102.20
    16. xxx8 a string containing a ", quote and comma 102.20
    17. xxx10 \\t 102.20
    18. 12 \\t 102.20
    19. dfadsfasxxx14 \\t 102.20
    20. admin@localhost : xiaoboluo 03:59:03> truncate test3; #清空表
    21. Query OK, 0 rows affected (0.01 sec)
    22. admin@localhost : xiaoboluo 03:59:38> load data infile "/tmp/test3.txt" into table test3 LINES STARTING BY 'xxx'; #匯入資料,指定行字首字元為xxx
    23. Query OK, 6 rows affected (0.00 sec)
    24. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    25. admin@localhost : xiaoboluo 03:59:44> select * from test3; #校驗表資料,可以看到沒有xxx行字首的行被忽略了,而包含xxx的最後一行,從xxx開始截斷,xxx字元本身及其之前的內容被忽略,\
    26. xxx之後的內容被解析為行資料匯入了
    27. +----+------------------------------------------+--------+
    28. | id | test | test2 |
    29. +----+------------------------------------------+--------+
    30. | 2 | a string | 100.20 |
    31. | 4 | a string containing a , comma | 102.20 |
    32. | 6 | a string containing a " quote | 102.20 |
    33. | 8 | a string containing a ", quote and comma | 102.20 |
    34. | 10 | \t | 102.20 |
    35. | 14 | \t | 102.20 |
    36. +----+------------------------------------------+--------+
    37. 6 rows in set (0.00 sec)
    38. 行結束符(換行符),linux下預設為\n,使用子句lines terminated by 'string' 指定,其中string代表指定的換行符
    39. # 指定換行符為\r\n匯出資料
    40. admin@localhost : xiaoboluo 03:59:49> system rm -f /tmp/test3.txt;
    41. admin@localhost : xiaoboluo 04:02:22> select * from test3 into outfile "/tmp/test3.txt" lines terminated by '\r\n';
    42. Query OK, 6 rows affected (0.00 sec)
    43. # 由於linux的一些命令本身會解析掉這些特殊字元,所以使用python來檢視這個文字檔案中的換行符,從下面的結果中可以看到,列表的每一個元素代表一行資料,每一個元素的\
    44. 末尾的\r\n就是這行資料的換行符
    45. >>> f = open('/tmp/test3.txt','r')
    46. >>> data = f.readlines()
    47. >>> data
    48. ['2\ta string\t100.20\r\n', '4\ta string containing a , comma\t102.20\r\n', '6\ta string containing a " quote\t102.20\r\n', '8\ta string containing a ", quote and comma\t102.20\r\n', '10\t\\\\t\t102.20\r\n', \
    49. '14\t\\\\t\t102.20\r\n']
    50. >>>
    51. # 現在,把資料重新匯入表,從下面的結果中可以看到,匯入表中的資料正確
    52. admin@localhost : xiaoboluo 04:02:39> truncate test3;
    53. Query OK, 0 rows affected (0.01 sec)
    54. admin@localhost : xiaoboluo 04:04:55> load data infile "/tmp/test3.txt" into table test3 lines terminated by '\r\n';
    55. Query OK, 6 rows affected (0.00 sec)
    56. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    57. admin@localhost : xiaoboluo 04:05:11> select * from test3;
    58. +----+------------------------------------------+--------+
    59. | id | test | test2 |
    60. +----+------------------------------------------+--------+
    61. | 2 | a string | 100.20 |
    62. | 4 | a string containing a , comma | 102.20 |
    63. | 6 | a string containing a " quote | 102.20 |
    64. | 8 | a string containing a ", quote and comma | 102.20 |
    65. | 10 | \t | 102.20 |
    66. | 14 | \t | 102.20 |
    67. +----+------------------------------------------+--------+
    68. 6 rows in set (0.00 sec)
1.2.6.3. FIELDS和LINES注意事項
  • 眾所周知,MySQL中反斜槓是SQL語句中特殊字元的跳脫字元,因此在sql語句中碰到特殊字元時,您必須指定一個或者兩個反斜槓來為特殊字元轉義(如在mysql中或者一些其他程式中,\n代表換行符,\t代表製表符,\代表轉義符,那麼需要使用\t來轉義製表符,\n來轉義換行符,\來轉義轉義符本身,這樣才能正確寫入資料庫或者生成匯出的資料文字,使用FIELDS ESCAPED BY子句指定轉義符

  • 特殊字元列表如

    1. \0 ASCII NUL (X'00') 字元
    2. \b 退格字元
    3. \n 換行符
    4. \r 回車符
    5. \t 製表符
    6. \Z ASCII 26 (Control+Z)
    7. \N NULL值,如果轉義符值為空,則會直接匯出null字串作為資料,這在匯入時將把null作為資料匯入,而不是null符號
  • 如果資料中包含了ENCLOSED BY '"'子句指定欄位引用符號,則與欄位引用符號相同資料字元也會被自動新增一個反斜槓進行轉義(如果轉義符指定為空,則可能會導致資料在匯入時無法正確解析)。如果資料中包含了FIELDS TERMINATED BY 子句指定的欄位分隔符,則以FIELDS ENCLOSED BY子句指定的欄位引用符號為準,被引起來的整個部分作為一整列的資料,列值之間的資料包含欄位分隔符不會被轉義,而是作為資料處理,但資料中包含的欄位引用符會被轉義(在資料中包含了欄位分隔符的情況下,如果欄位引用符號沒有指定或者指定為空值,則可能在匯入資料時無法正確解析)。如果資料中包含了FIELDS ESCAPED BY子句指定的轉義符,欄位引用符和行分隔符使用預設值,則在資料中的轉義符會被轉義(只要不為空,則不管欄位分隔符和跳脫字元定義為什麼值,都會被轉義),預設情況下,不建議隨意更改換行符和轉義符,除非必須且你需要校驗修改之後資料能夠正確匯入
    1. # 欄位引用符為",資料中包含",轉義符和換行符保持預設,匯入資料時不會有任何問題
    2. admin@localhost : xiaoboluo 09:46:14> select * from test3;
    3. +----+------------------------------------------+--------+
    4. | id | test | test2 |
    5. +----+------------------------------------------+--------+
    6. | 2 | a string | 100.20 |
    7. | 4 | a string containing a , comma | 102.20 |
    8. | 6 | a string containing a " quote | 102.20 |
    9. | 8 | a string containing a ", quote and comma | 102.20 |
    10. +----+------------------------------------------+--------+
    11. 4 rows in set (0.00 sec)
    12. admin@localhost : xiaoboluo 09:46:17> select * from test3 into outfile "/tmp/test3.txt" FIELDS OPTIONALLY enclosed BY '"';
    13. Query OK, 4 rows affected (0.00 sec)
    14. admin@localhost : xiaoboluo 09:46:23> system cat /tmp/test3.txt;
    15. 2 "a string" "100.20"
    16. 4 "a string containing a , comma" "102.20"
    17. 6 "a string containing a \" quote" "102.20"
    18. 8 "a string containing a \", quote and comma" "102.20" # 可以看到與欄位引用符相同的符號資料被轉義了
    19. admin@localhost : xiaoboluo 09:54:41> truncate test3;
    20. Query OK, 0 rows affected (0.01 sec)
    21. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    22. Query OK, 4 rows affected (0.00 sec)
    23. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    24. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    25. +----+------------------------------------------+--------+
    26. | id | test | test2 |
    27. +----+------------------------------------------+--------+
    28. | 2 | a string | 100.20 |
    29. | 4 | a string containing a , comma | 102.20 |
    30. | 6 | a string containing a " quote | 102.20 |
    31. | 8 | a string containing a ", quote and comma | 102.20 |
    32. +----+------------------------------------------+--------+
    33. 4 rows in set (0.00 sec)
    34. # 如果欄位引用符為",欄位分隔符為,且資料中包含欄位引用符"和欄位分隔符,,轉義符和換行符保持預設,這在匯入資料時不會有任何問題
    35. admin@localhost : xiaoboluo 09:53:45> select * from test3 into outfile "/tmp/test3.txt" FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    36. Query OK, 4 rows affected (0.00 sec)
    37. admin@localhost : xiaoboluo 09:54:29> system cat /tmp/test3.txt;
    38. 2,"a string","100.20"
    39. 4,"a string containing a , comma","102.20"
    40. 6,"a string containing a \" quote","102.20"
    41. 8,"a string containing a \", quote and comma","102.20"
    42. admin@localhost : xiaoboluo 09:54:41> truncate test3;
    43. Query OK, 0 rows affected (0.01 sec)
    44. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    45. Query OK, 4 rows affected (0.00 sec)
    46. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    47. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    48. +----+------------------------------------------+--------+
    49. | id | test | test2 |
    50. +----+------------------------------------------+--------+
    51. | 2 | a string | 100.20 |
    52. | 4 | a string containing a , comma | 102.20 |
    53. | 6 | a string containing a " quote | 102.20 |
    54. | 8 | a string containing a ", quote and comma | 102.20 |
    55. +----+------------------------------------------+--------+
    56. 4 rows in set (0.00 sec)
    57. # 但是,如果在欄位引用符為",資料中包含",欄位分隔符使用逗號,換行符保持預設的情況下,轉義符使用了空串,這會導致在匯入資料時,第四行無法正確解析,報錯
    58. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    59. Query OK, 4 rows affected (0.00 sec)
    60. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    61. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    62. +----+------------------------------------------+--------+
    63. | id | test | test2 |
    64. +----+------------------------------------------+--------+
    65. | 2 | a string | 100.20 |
    66. | 4 | a string containing a , comma | 102.20 |
    67. | 6 | a string containing a " quote | 102.20 |
    68. | 8 | a string containing a ", quote and comma | 102.20 |
    69. +----+------------------------------------------+--------+
    70. 4 rows in set (0.00 sec)
    71. admin@localhost : xiaoboluo 09:58:49> select * from test3 into outfile "/tmp/test3_test.txt" FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',' escaped by '';
    72. Query OK, 4 rows affected (0.00 sec)
    73. admin@localhost : xiaoboluo 10:00:42> system cat /tmp/test3_test.txt;
    74. 2,"a string","100.20"
    75. 4,"a string containing a , comma","102.20"
    76. 6,"a string containing a " quote","102.20" #關於這一行資料,需要說明一下ENCLOSED BY子句,該子句指定的引用符號從一個FIELDS TERMINATED BY子句指定的分隔符開始,直到碰到下一個\
    77. 分隔符之間且這個分隔符前面一個字元必須是欄位引用符號(如果這個分隔符前面一個字元不是欄位引用符,則繼續往後匹配,如第二行資料),在這之間的內容都會被當作整個列字串處理,\
    78. 所以這一行資料在匯入時不會發生解析錯誤
    79. 8,"a string containing a ", quote and comma","102.20" #這一行因為無法正確識別的欄位結束位置,所以無法匯入,報錯終止,前面正確的行也被回滾掉(binlog_format=row)
    80. admin@localhost : xiaoboluo 10:00:49> truncate test3;
    81. Query OK, 0 rows affected (0.01 sec)
    82. admin@localhost : xiaoboluo 10:01:03> load data infile '/tmp/test3_test.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',' escaped by '';
    83. ERROR 1262 (01000): Row 4 was truncated; it contained more data than there were input columns
    84. admin@localhost : xiaoboluo 10:01:33> select * from test3;
    85. Empty set (0.00 sec)
    86. # 資料中包含了預設的轉義符和指定的欄位分隔符,欄位引用符和行分隔符使用預設值,則在資料中的轉義符和欄位分隔符會被轉義(只要不為空,則不管欄位分隔符和跳脫字元定義為什麼值,\
    87. 都會被轉義)
    88. admin@localhost : xiaoboluo 03:08:45> insert into test3(test,test2) values('\\t','102.20');
    89. Query OK, 1 row affected (0.00 sec)
    90. admin@localhost : xiaoboluo 03:17:29> select * from test3;
    91. +----+------------------------------------------+--------+
    92. | id | test | test2 |
    93. +----+------------------------------------------+--------+
    94. | 2 | a string | 100.20 |
    95. | 4 | a string containing a , comma | 102.20 |
    96. | 6 | a string containing a " quote | 102.20 |
    97. | 8 | a string containing a ", quote and comma | 102.20 |
    98. | 10 | \t | 102.20 |
    99. +----+------------------------------------------+--------+
    100. 5 rows in set (0.00 sec)
    101. admin@localhost : xiaoboluo 03:17:32> system rm -f /tmp/test3.txt;
    102. admin@localhost : xiaoboluo 03:17:39> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',';
    103. Query OK, 5 rows affected (0.01 sec)
    104. admin@localhost : xiaoboluo 03:17:42> system cat /tmp/test3.txt
    105. 2,a string,100.20
    106. 4,a string containing a \, comma,102.20
    107. 6,a string containing a " quote,102.20
    108. 8,a string containing a "\, quote and comma,102.20
    109. 10,\\t,102.20
  • 當您使用SELECT … INTO OUTFILE與LOAD DATA INFILE一起將資料從資料庫寫入檔案,然後再將該檔案讀回資料庫時,兩個語句的FIELDS和LINES處理選項必須匹配。否則,LOAD DATA INFILE將解析錯誤的檔案內容,示例
    1. # 假設您執行SELECT ... INTO OUTFILE語句時使用了逗號作為列分隔符:
    2. SELECT * INTO OUTFILE 'data.txt'
    3.  FIELDS TERMINATED BY ','
    4.  FROM table2;
    5. # 如果您嘗試使用\t作為列分隔符,則它將無法正常工作,因為它會指示LOAD DATA INFILE在欄位之間查詢製表符,可能導致每個資料行整行解析時被當作單個欄位:
    6. LOAD DATA INFILE 'data.txt' INTO TABLE table2
    7.  FIELDS TERMINATED BY '\t';
    8. # 要正確讀取逗號分隔各列的檔案,正確的語句是
    9. LOAD DATA INFILE 'data.txt' INTO TABLE table2
    10.  FIELDS TERMINATED BY ','
  • 任何FIELDS和LINES處理選項都可以指定一個空字串(''),但強烈不建議在FIELDS TERMINATED BY、FIELDS ESCAPED BY 和LINES TERMINATED BY子句中使用空串(空格不算空串)作為轉義符和換行符,可能導致許多意外的問題,除非你確定使用空串不會出現問題。如果不為空,注意FIELDS [OPTIONALLY] ENCLOSED BY和FIELDS ESCAPED BY子句指定的值只能指定單個字元(即欄位引用符號和轉義符只能使用單個字元)。但 FIELDS TERMINATED BY, LINES STARTING BY, and LINES TERMINATED BY子句的值可以是多個字元(即欄位分隔符和換行符、行字首字元可以使用多個字元)。例如,指定一個LINES TERMINATED BY'\r\ n'子句,表示指定行換行符為\r\n,這個也是WIN下的換行符
    1. # 如果LINES TERMINATED BY換行符指定了一個空字元,並且FIELDS TERMINATED BY欄位分隔符指定的是非空的一個字元(或者使用預設值\t),則行也會以欄位分隔符作為行的結束符\
    2. (表現行為就是文字中最後一個字元就是欄位分隔符),即整個文字看上去就是一整行資料了
    3. admin@localhost : xiaoboluo 04:48:35> system rm -f /tmp/test3.txt;
    4. admin@localhost : xiaoboluo 04:53:59> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',' lines terminated by '';
    5. Query OK, 6 rows affected (0.00 sec)
    6. # 使用python檢視文字內容,從下面的結果中可以看到,整個表的資料由於換行符為空,所以導致都拼接為一行了,最後行結束符使用了欄位分隔符逗號
    7. >>> f = open('/tmp/test3.txt','r')
    8. >>> data = f.readlines()
    9. >>> data
    10. ['2,a string,100.20,4,a string containing a \\, comma,102.20,6,a string containing a " quote,102.20,8,a string containing a "\\, quote and comma,102.20,10,\\\\t,102.20,14,\\\\t,102.20,']
    11. >>>
    12. # 匯入資料到表,這裡新建一張表來進行匯入測試,預防清理掉了表資料之後,文字內容又無法正確匯入的情況發生
    13. admin@localhost : xiaoboluo 04:57:52> create table test4 like test3;
    14. Query OK, 0 rows affected (0.01 sec)
    15. admin@localhost : xiaoboluo 04:57:59> load data infile "/tmp/test3.txt" into table test4 FIELDS TERMINATED BY ',' lines terminated by '';
    16. Query OK, 6 rows affected (0.00 sec)
    17. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    18. admin@localhost : xiaoboluo 04:58:26> select * from test4; #從查詢結果上看,資料正確匯入表test4中了
    19. +----+------------------------------------------+--------+
    20. | id | test | test2 |
    21. +----+------------------------------------------+--------+
    22. | 2 | a string | 100.20 |
    23. | 4 | a string containing a , comma | 102.20 |
    24. | 6 | a string containing a " quote | 102.20 |
    25. | 8 | a string containing a ", quote and comma | 102.20 |
    26. | 10 | \t | 102.20 |
    27. | 14 | \t | 102.20 |
    28. +----+------------------------------------------+--------+
    29. 6 rows in set (0.00 sec)
    30. # 如果FIELDS TERMINATED BY和FIELDS ENCLOSED BY值都為空(''),則使用固定行(非限制)格式。使用固定行格式時,欄位之間使用足夠寬的空格來分割各欄位。對於資料型別\
    31. 是TINYINT,SMALLINT,MEDIUMINT,INT和BIGINT,欄位寬度分別為4,6,8,11和20個空格(無論資料型別宣告的顯示寬度如何),對於varchar型別使用大約298個空格(這個空格數量是自己\
    32. 數的。。。,猜想這個空格數量可能與字符集,varchar定義長度有關,因為我在嘗試把varchar定義為50個字元的時候,空格少了156個左右)
    33. admin@localhost : xiaoboluo 04:58:31> system rm -f /tmp/test3.txt;
    34. admin@localhost : xiaoboluo 05:04:05> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY '' lines terminated by '';
    35. Query OK, 6 rows affected (0.00 sec)
    36. admin@localhost : xiaoboluo 05:04:17> system cat /tmp/test3.txt #下面展示內容中把打斷空格使用...代替
    37. 2 a string ... 100.20 ... 4 a string containing a , comma ... 102.20 ... 6 a string containing a " quote ...102.20 ... 8 a string containing a ", quote and comma ... 102.20 \
    38.  ... 10 \\t ... 102.20 ... 14 \\t ... 102.20 ... admin@localhost : xiaoboluo 05:04:35>
    39. # 現在,清理掉test4表,並載入資料,從下面的結果中可以看到,匯入表中之後,雖然資料是對的,但是多了非常多的空格,那麼也就意味著你需要使用程式正確地處理一下這些多餘的空格之後,\
    40. 再執行匯入
    41. admin@localhost : xiaoboluo 05:06:19> truncate test4;
    42. Query OK, 0 rows affected (0.01 sec)
    43. admin@localhost : xiaoboluo 05:06:47> load data infile "/tmp/test3.txt" into table test4 FIELDS TERMINATED BY '' lines terminated by ''; # 注意:這是在sql_mode=''時匯入的,如果不修改\
    44. sql_mode請使用local關鍵字
    45. Query OK, 6 rows affected, 12 warnings (0.01 sec)
    46. Records: 6 Deleted: 0 Skipped: 0 Warnings: 12
    47. Note (Code 1265): Data truncated for column 'test' at row 1
    48. Note (Code 1265): Data truncated for column 'test2' at row 1
    49. Note (Code 1265): Data truncated for column 'test' at row 2
    50. Note (Code 1265): Data truncated for column 'test2' at row 2
    51. Note (Code 1265): Data truncated for column 'test' at row 3
    52. Note (Code 1265): Data truncated for column 'test2' at row 3
    53. Note (Code 1265): Data truncated for column 'test' at row 4
    54. Note (Code 1265): Data truncated for column 'test2' at row 4
    55. Note (Code 1265): Data truncated for column 'test' at row 5
    56. Note (Code 1265): Data truncated for column 'test2' at row 5
    57. Note (Code 1265): Data truncated for column 'test' at row 6
    58. Note (Code 1265): Data truncated for column 'test2' at row 6
    59. admin@localhost : xiaoboluo 05:07:09> select * from test4;
    60. +----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
    61. | id | test | test2 |
    62. +----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
    63. | 2 | a string | 100.20 |
    64. | 4 | a string containing a , comma | 102.20 |
    65. | 6 | a string containing a " quote | 102.20 |
    66. | 8 | a string containing a ", quote and comma | 102.20 |
    67. | 10 | \t | 102.20 |
    68. | 14 | \t | 102.20 |
    69. +----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
    70. 6 rows in set (0.00 sec)
  • NULL值的處理根據使用的FIELDS和LINES選項而有所不同
    1. # 對於預設的FIELDS和LINES值,NULL值被轉義為\N輸出,欄位值\N讀取時使用NULL替換並輸入(假設ESCAPED BY字元為\)
    2. admin@localhost : xiaoboluo 05:17:07> alter table test3 add column test3 varchar(10); #新增一個欄位test3,預設值會被填充為null
    3. Query OK, 0 rows affected (0.04 sec)
    4. Records: 0 Duplicates: 0 Warnings: 0
    5. admin@localhost : xiaoboluo 05:17:33> select * from test3; #檢視錶中的test3列資料
    6. +----+------------------------------------------+--------+-------+
    7. | id | test | test2 | test3 |
    8. +----+------------------------------------------+--------+-------+
    9. | 2 | a string | 100.20 | NULL |
    10. | 4 | a string containing a , comma | 102.20 | NULL |
    11. | 6 | a string containing a " quote | 102.20 | NULL |
    12. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    13. | 10 | \t | 102.20 | NULL |
    14. | 14 | \t | 102.20 | NULL |
    15. +----+------------------------------------------+--------+-------+
    16. 6 rows in set (0.00 sec)
    17. admin@localhost : xiaoboluo 05:17:37> select * from test3 into outfile "/tmp/test3.txt"; #執行匯出
    18. Query OK, 6 rows affected (0.00 sec)
    19. admin@localhost : xiaoboluo 05:18:02> system cat /tmp/test3.txt #檢視匯出的文字檔案,可以發現null被轉義為\N了,這是為了避免資料字串本身包含null值時無法正確區分資料型別的null值
    20. 2 a string 100.20 \N
    21. 4 a string containing a , comma 102.20 \N
    22. 6 a string containing a " quote 102.20 \N
    23. 8 a string containing a ", quote and comma 102.20 \N
    24. 10 \\t 102.20 \N
    25. 14 \\t 102.20 \N
    26. # 匯入資料,從結果中可以看到\N被正確解析為了資料型別的null值
    27. admin@localhost : xiaoboluo 05:18:06> truncate test3;
    28. Query OK, 0 rows affected (0.01 sec)
    29. admin@localhost : xiaoboluo 05:20:36> load data infile '/tmp/test3.txt' into table test3;
    30. Query OK, 6 rows affected (0.01 sec)
    31. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    32. admin@localhost : xiaoboluo 05:20:52> select * from test3;
    33. +----+------------------------------------------+--------+-------+
    34. | id | test | test2 | test3 |
    35. +----+------------------------------------------+--------+-------+
    36. | 2 | a string | 100.20 | NULL |
    37. | 4 | a string containing a , comma | 102.20 | NULL |
    38. | 6 | a string containing a " quote | 102.20 | NULL |
    39. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    40. | 10 | \t | 102.20 | NULL |
    41. | 14 | \t | 102.20 | NULL |
    42. +----+------------------------------------------+--------+-------+
    43. 6 rows in set (0.00 sec)
    44. # 如果FIELDS ENCLOSED BY不為空,FIELDS escaped BY為空時,則將NULL值的字面字串作為輸出字元值。這與FIELDS ENCLOSED BY引用的字串中包含的null值不同,\
    45. 後者讀取為字串'null',而前者讀取到資料庫中時被當作資料型別的null值,而不是資料的字串null
    46. admin@localhost : xiaoboluo 05:20:57> update test3 set test3='null' where id=2; #更新id=2的test3列值為資料字串的null
    47. Query OK, 1 row affected (0.00 sec)
    48. Rows matched: 1 Changed: 1 Warnings: 0
    49. admin@localhost : xiaoboluo 05:23:14> select * from test3;
    50. +----+------------------------------------------+--------+-------+
    51. | id | test | test2 | test3 |
    52. +----+------------------------------------------+--------+-------+
    53. | 2 | a string | 100.20 | null |
    54. | 4 | a string containing a , comma | 102.20 | NULL |
    55. | 6 | a string containing a " quote | 102.20 | NULL |
    56. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    57. | 10 | \t | 102.20 | NULL |
    58. | 14 | \t | 102.20 | NULL |
    59. +----+------------------------------------------+--------+-------+
    60. 6 rows in set (0.00 sec)
    61. admin@localhost : xiaoboluo 05:23:16> system rm -f /tmp/test3.txt;
    62. admin@localhost : xiaoboluo 05:24:05> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '"' escaped BY ''; #指定行引用符號為雙引號",轉義符為空匯出資料
    63. Query OK, 6 rows affected (0.00 sec)
    64. admin@localhost : xiaoboluo 05:24:51> system cat /tmp/test3.txt #檢視匯出的文字檔案,可以看到資料字串的null被加了雙引號,而資料型別的null沒有加雙引號
    65. "2" "a string" "100.20" "null"
    66. "4" "a string containing a , comma" "102.20" NULL
    67. "6" "a string containing a " quote" "102.20" NULL
    68. "8" "a string containing a ", quote and comma" "102.20" NULL
    69. "10" "\t" "102.20" NULL
    70. "14" "\t" "102.20" NULL
    71. admin@localhost : xiaoboluo 05:24:57> alter table test4 add column test3 varchar(10);
    72. Query OK, 0 rows affected (0.04 sec)
    73. Records: 0 Duplicates: 0 Warnings: 0
    74. admin@localhost : xiaoboluo 05:26:40> truncate test4; #這裡使用test4表做測試,避免無法匯入的情況發生
    75. Query OK, 0 rows affected (0.00 sec)
    76. admin@localhost : xiaoboluo 05:26:44> load data infile '/tmp/test3.txt' into table test4 FIELDS ENCLOSED BY '"' escaped BY ''; #指定欄位引用符為雙引號",轉義符為空匯入資料
    77. Query OK, 6 rows affected (0.00 sec)
    78. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    79. admin@localhost : xiaoboluo 05:27:02> select * from test4; #檢視錶test4中的資料,從結果中可以看到,資料匯入正確
    80. +----+------------------------------------------+--------+-------+
    81. | id | test | test2 | test3 |
    82. +----+------------------------------------------+--------+-------+
    83. | 2 | a string | 100.20 | null |
    84. | 4 | a string containing a , comma | 102.20 | NULL |
    85. | 6 | a string containing a " quote | 102.20 | NULL |
    86. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    87. | 10 | \t | 102.20 | NULL |
    88. | 14 | \t | 102.20 | NULL |
    89. +----+------------------------------------------+--------+-------+
    90. 6 rows in set (0.00 sec)
    91. # 使用固定行格式(當FIELDS TERMINATED BY和FIELDS ENCLOSED BY都為空時使用),將NULL寫為空字串。這將導致表中的NULL值和空字串在寫入檔案時無法區分,\
    92. 因為它們都以空字串形式寫入文字檔案。如果您需要能夠在讀取檔案時將其分開,則不應使用固定行格式(即不應該使用FIELDS TERMINATED BY和FIELDS ENCLOSED BY都為空)
    93. admin@localhost : xiaoboluo 05:29:11> system rm -f /tmp/test3.txt;
    94. admin@localhost : xiaoboluo 05:29:22> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '' TERMINATED BY '';
    95. Query OK, 6 rows affected (0.00 sec)
    96. admin@localhost : xiaoboluo 05:29:43> system cat /tmp/test3.txt #從結果中看,是不是有點似曾相識呢?沒錯,前面演示過FIELDS TERMINATED BY和FIELDS ENCLOSED BY都為空的情況,\
    97. 使用了固定格式來匯出文字,但是這裡多了資料型別的null值處理,從下面的結果中已經看不到資料型別的null了,被轉換為了空值(下面展示時把大段空格使用...代替)
    98. 2 a string ... 100.20 ... null
    99. 4 a string containing a , comma ... 102.20 ...
    100. 6 a string containing a " quote ... 102.20 ...
    101. 8 a string containing a ", quote and comma ... 102.20 ...
    102. 10 \\t ... 102.20 ...
    103. 14 \\t ... 102.20 ...
    104. admin@localhost : xiaoboluo 05:29:46> truncate test4; #清空test4
    105. Query OK, 0 rows affected (0.01 sec)
    106. admin@localhost : xiaoboluo 05:34:15> load data infile "/tmp/test3.txt" into table test4 FIELDS ENCLOSED BY '' TERMINATED BY ''; #執行匯入文字到test4表,注意:這是在sql_mode=''時匯入的,\
    107. 如果不修改sql_mode請使用local關鍵字
    108. Query OK, 6 rows affected, 24 warnings (0.01 sec)
    109. Records: 6 Deleted: 0 Skipped: 0 Warnings: 24
    110. Note (Code 1265): Data truncated for column 'test' at row 1
    111. Note (Code 1265): Data truncated for column 'test2' at row 1
    112. Note (Code 1265): Data truncated for column 'test3' at row 1
    113. Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
    114. Note (Code 1265): Data truncated for column 'test' at row 2
    115. Note (Code 1265): Data truncated for column 'test2' at row 2
    116. Note (Code 1265): Data truncated for column 'test3' at row 2
    117. Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
    118. Note (Code 1265): Data truncated for column 'test' at row 3
    119. Note (Code 1265): Data truncated for column 'test2' at row 3
    120. Note (Code 1265): Data truncated for column 'test3' at row 3
    121. Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
    122. Note (Code 1265): Data truncated for column 'test' at row 4
    123. Note (Code 1265): Data truncated for column 'test2' at row 4
    124. Note (Code 1265): Data truncated for column 'test3' at row 4
    125. Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
    126. Note (Code 1265): Data truncated for column 'test' at row 5
    127. Note (Code 1265): Data truncated for column 'test2' at row 5
    128. Note (Code 1265): Data truncated for column 'test3' at row 5
    129. Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
    130. Note (Code 1265): Data truncated for column 'test' at row 6
    131. Note (Code 1265): Data truncated for column 'test2' at row 6
    132. Note (Code 1265): Data truncated for column 'test3
  • load data執行時如果表中有外來鍵、輔助索引、唯一索引,那麼會導致載入資料的時間變慢,因為索引也需要一同更新,可以使用對應引數關閉外來鍵檢查、唯一索引檢查甚至關閉索引

    • 要在載入操作期間忽略外來鍵約束,可以在執行load data語句之前執行SET foreign_key_checks = 0語句,執行完畢之後執行SET foreign_key_checks = 1或斷開會話重連
    • 要在載入操作期間忽略唯一索引約束,可以在執行load data語句之前執行set unique_checks=0語句,執行完畢之後執行set unique_checks=1或斷開會話重連
    • 在某些極端情況下(比如表中索引過多),您可以在執行load data語句之前透過執行ALTER TABLE … DISABLE KEYS語句關閉建立索引,在執行完load data語句之後執行ALTER TABLE … ENABLE KEYS來重新建立索引,注意該語句不能關閉主鍵索引
  • 如果在sql_mode設定為嚴格模式下,且不使用local和ignore關鍵字時,碰到缺少欄位值會直接報錯終止,但在sql_mode設定為嚴格模式下,使用了local和ignore關鍵字時,則行為與不使用嚴格模式類似
    LOAD DATA INFILE將所有輸入視為字串,因此您不能認為load data語句會像INSERT語句那樣插入ENUM或SET列的數值。所有ENUM和SET值必須指定為字串

  • LOAD DATA INFILE不支援的場景

    • 固定大小的行(FIELDS TERMINATED BY和FIELDS ENCLOSED BY都為空)不支援BLOB或TEXT列
    • 如果FIELDS TERMINATED BY和LINES STARTING BY指定相同的字元,則LOAD DATA INFILE無法正確解析
    • 如果FIELDS ESCAPED BY為空,則欄位中包含了FIELDS ENCLOSED BY或LINES TERMINATED BY或FIELDS TERMINATED BY的字元時會導致LOAD DATA INFILE語句拒絕讀取欄位並報錯。這是因為LOAD DATA INFILE無法正確確定欄位或行在哪裡結束
  • PS:在Unix上,如果需要LOAD DATA從管道讀取資料,可以使用以下方法(該示例將/目錄的列表載入到表db1.t1中,find命令掛後臺持續查詢內容並生成ls.dat檔案,mysql 客戶端使用-e選項來執行load data這個檔案到表):

    1. mkfifo /mysql/data/db1/ls.dat
    2. chmod 666 /mysql/data/db1/ls.dat
    3. find / -ls> /mysql/data/db1/ls.dat&
    4. mysql -e "LOAD DATA INFILE 'ls.dat' INTO TABLE t1" db1
1.2.7. IGNORE number {LINES | ROWS}子句
  • 忽略輸入檔案中的前number行資料,使用子句ignore number lines指定忽略文字的前number行,在某些情況下生成的文字(如:mysql -e "select …." > xx.txt中)帶有欄位名稱,在匯入時會把這一行欄位名稱也當作資料,所以需要忽略掉這行欄位名稱
    1. admin@localhost : xiaoboluo 05:34:41> system cat /tmp/test3.txt
    2. id test test2 test3
    3. 2 a string 100.20 null
    4. 4 a string containing a , comma 102.20 NULL
    5. 6 a string containing a " quote 102.20 NULL
    6. 8 a string containing a ", quote and comma 102.20 NULL
    7. 10 \\t 102.20 NULL
    8. 14 \\t 102.20 NULL
    9. admin@localhost : xiaoboluo 05:41:35> truncate test4;
    10. Query OK, 0 rows affected (0.01 sec)
    11. admin@localhost : xiaoboluo 05:41:41> load data infile "/tmp/test3.txt" into table test4 ignore 1 lines; #載入文字時指定ignore 1 lines子句忽略文字中的前1行資料
    12. Query OK, 6 rows affected (0.00 sec)
    13. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    14. admin@localhost : xiaoboluo 05:42:22> select * from test4; #查詢表test4中的資料,從下面的結果中可以看到資料正確
    15. +----+------------------------------------------+--------+-------+
    16. | id | test | test2 | test3 |
    17. +----+------------------------------------------+--------+-------+
    18. | 2 | a string | 100.20 | null |
    19. | 4 | a string containing a , comma | 102.20 | NULL |
    20. | 6 | a string containing a " quote | 102.20 | NULL |
    21. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    22. | 10 | \t | 102.20 | NULL |
    23. | 14 | \t | 102.20 | NULL |
    24. +----+------------------------------------------+--------+-------+
    25. 6 rows in set (0.00 sec)
  • LOAD DATA INFILE可用於讀取外部資料來源檔案。例如,許多程式可以以逗號分隔的值(CSV)格式匯出資料,欄位用逗號分隔,幷包含在雙引號內,並帶有一個欄位列名的初始行。如果這樣一個檔案中的資料行的換行符再使用回車符,則load data語句可以這樣編寫:
    1. LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
    2.  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
    3.  LINES TERMINATED BY '\r\n'
    4.  IGNORE 1 LINES;
    5. # 如果輸入值不一定包含在引號內,請在ENCLOSED BY關鍵字之前使用OPTIONALLY,如:OPTIONALLY ENCLOSED BY '"',加上OPTIONALLY 可能會忽略數值型別的欄位的引用符號,\
    6. 另外,如果你的csv檔案第一行是資料而不是列名,那就不能使用IGNORE 1 LINES子句
1.2.8. (col_name_or_user_var,…)指定欄位名稱的子句
  • 預設情況下,如果使用load data語句時表名後邊不帶欄位,那麼會把整個表的欄位資料都匯入到資料庫中,如:LOAD DATA INFILE'persondata.txt'INTO TABLE persondata;
  • 如果只想載入某些列,請指定列列表,如:LOAD DATA INFILE'persondata.txt'INTO TABLE persondata(col1,col2,…); ,要注意:如果輸入檔案中的欄位值順序與表中列的順序不同,你需要對load data語句中的tb_name後跟的欄位順序做一下調整以對應文字檔案中的欄位順序。否則,MySQL不能判斷如何與表中的順序對齊,列出列名時可以在tb_name後指定具體的列名,也可以使用表示式生成值指定給某個列名(使用set語句指定一個表示式,複製給一個變數,詳見1.2.9小節),如果沒有set語句,建議列名寫在tb_name表名後邊,方便理解,有set語句時就跟set語句寫在一起

    • 如果發現檔案中的列順序和表中的列順序不符,或者只想載入部分列,在命令中加上列的順序時指定的欄位名也不一定非要放在緊跟著表名,可以放在語句最後面也可以,如:load data infile "/tmp/filename.txt" into table emp fields terminated by ',' enclosed by '"' ignore 2 lines (id,content,name);如果只需要匯入一個欄位id,則把 (id,content,name)換做(id)即可

    • 使用示例參考1.1小節的“如果文字檔案中的資料欄位與表結構中的欄位定義順序不同,則使用如下語句指定載入表中的欄位順序”演示部分

1.2.8. SET col_name = expr,…子句
  • 將列做一定的數值轉換後再載入,使用子句set col_name = expr,.. 指定,要注意:col_name必須為表中真實的列名,expr可以是任意的表示式或者子查詢,只要返回的資料結果值能對應上表中的欄位資料定義型別即可,注意,非set語句生成的列名,必須使用括號括起來,否則報語法錯誤。
    1. # 如果系統將id列的文字資料加上10以後再載入到表的test3列中,可以如下操作:
    2. admin@localhost : xiaoboluo 06:05:42> system rm -f /tmp/test3.txt;
    3. admin@localhost : xiaoboluo 06:06:00> select * from test3 into outfile "/tmp/test3.txt";
    4. Query OK, 6 rows affected (0.00 sec)
    5. admin@localhost : xiaoboluo 06:06:04> system cat /tmp/test3.txt
    6. 2 a string 100.20 null
    7. 4 a string containing a , comma 102.20 \N
    8. 6 a string containing a " quote 102.20 \N
    9. 8 a string containing a ", quote and comma 102.20 \N
    10. 10 \\t 102.20 \N
    11. 14 \\t 102.20 \N
    12. admin@localhost : xiaoboluo 06:07:49> truncate test4;
    13. Query OK, 0 rows affected (0.01 sec)
    14. admin@localhost : xiaoboluo 06:07:53> load data infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+10 ;
    15. ERROR 1262 (01000): Row 1 was truncated; it contained more data than there were input columns
    16. admin@localhost : xiaoboluo 06:08:02> select * from test4; #嚴格模式下因為文字中多了一個欄位被截斷了,所以拒絕匯入
    17. Empty set (0.00 sec)
    18. admin@localhost : xiaoboluo 06:08:08> load data local infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+10 ; #可以使用local關鍵字強制進行截斷最後一個欄位的null值列進行匯入,\
    19. 注意,如果不使用local關鍵字,那就需要修改sql_mode才能匯入
    20. Query OK, 6 rows affected, 6 warnings (0.01 sec)
    21. Records: 6 Deleted: 0 Skipped: 0 Warnings: 6
    22. Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
    23. Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
    24. Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
    25. Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
    26. Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
    27. Warning (Code 1262): Row 6 was truncated; it contained more data than there were input columns
    28. admin@localhost : xiaoboluo 06:10:45> select * from test4;
    29. +----+------------------------------------------+--------+-------+
    30. | id | test | test2 | test3 |
    31. +----+------------------------------------------+--------+-------+
    32. | 2 | a string | 100.20 | 12 |
    33. | 4 | a string containing a , comma | 102.20 | 14 |
    34. | 6 | a string containing a " quote | 102.20 | 16 |
    35. | 8 | a string containing a ", quote and comma | 102.20 | 18 |
    36. | 10 | \t | 102.20 | 20 |
    37. | 14 | \t | 102.20 | 24 |
    38. +----+------------------------------------------+--------+-------+
    39. 6 rows in set (0.00 sec)
    40. # 或者使用txt檔案中的某些列進行計算後生成新的列插入,這裡演示兩個欄位進行相加後匯入另外一個欄位中:
    41. admin@localhost : xiaoboluo 06:18:37> load data local infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+test2 ; # 注意,如果不使用local關鍵字,那就需要修改sql_mode才能匯入
    42. Query OK, 6 rows affected, 6 warnings (0.00 sec)
    43. Records: 6 Deleted: 0 Skipped: 0 Warnings: 6
    44. Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
    45. Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
    46. Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
    47. Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
    48. Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
    49. Warning (Code 1262): Row 6 was truncated; it contained more data than there were input columns
    50. admin@localhost : xiaoboluo 06:19:07> select * from test4;
    51. +----+------------------------------------------+--------+-------+
    52. | id | test | test2 | test3 |
    53. +----+------------------------------------------+--------+-------+
    54. | 2 | a string | 100.20 | 102.2 |
    55. | 4 | a string containing a , comma | 102.20 | 106.2 |
    56. | 6 | a string containing a " quote | 102.20 | 108.2 |
    57. | 8 | a string containing a ", quote and comma | 102.20 | 110.2 |
    58. | 10 | \t | 102.20 | 112.2 |
    59. | 14 | \t | 102.20 | 116.2 |
    60. +----+------------------------------------------+--------+-------+
    61. 6 rows in set (0.00 sec)
  • SET子句中使用使用者變數,使用者變數可以以多種方式使用
    1. # 可以直接使用一個使用者變數並進行計算(計算表示式可以使用函式、運算子、子查詢等都允許),然後賦值給test4列直接匯入,而不需要從檔案中讀取test4列資料,該列資料也允許在檔案中不存在
    2. admin@localhost : xiaoboluo 06:27:06> alter table test4 add column test4 varchar(20); #新建一個欄位test4,用於匯入set子句計算的值
    3. Query OK, 0 rows affected (0.01 sec)
    4. Records: 0 Duplicates: 0 Warnings: 0
    5. admin@localhost : xiaoboluo 06:27:56> truncate test4;
    6. Query OK, 0 rows affected (0.01 sec)
    7. admin@localhost : xiaoboluo 06:28:02> set @test=200; #設定一個使用者變數
    8. Query OK, 0 rows affected (0.00 sec)
    9. admin@localhost : xiaoboluo 06:30:32> load data infile "/tmp/test3.txt" into table test4 (id,test,test2,test3) set test4=round(@test/100,0) ; #執行匯入,使用set子句匯入test4列透過表示式\
    10. round(@test/100,0)計算之後的值
    11. Query OK, 6 rows affected (0.00 sec)
    12. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    13. admin@localhost : xiaoboluo 06:30:52> select * from test4; #檢視test4表中匯入的資料,從以下結果中來看,匯入資料正確
    14. +----+------------------------------------------+--------+-------+-------+
    15. | id | test | test2 | test3 | test4 |
    16. +----+------------------------------------------+--------+-------+-------+
    17. | 2 | a string | 100.20 | null | 2 |
    18. | 4 | a string containing a , comma | 102.20 | NULL | 2 |
    19. | 6 | a string containing a " quote | 102.20 | NULL | 2 |
    20. | 8 | a string containing a ", quote and comma | 102.20 | NULL | 2 |
    21. | 10 | \t | 102.20 | NULL | 2 |
    22. | 14 | \t | 102.20 | NULL | 2 |
    23. +----+------------------------------------------+--------+-------+-------+
    24. 6 rows in set (0.00 sec)
    25. # SET子句可以將一個內部函式返回的值直接匯入到一個指定列
    26. admin@localhost : xiaoboluo 06:31:22> truncate test4;
    27. Query OK, 0 rows affected (0.01 sec)
    28. admin@localhost : xiaoboluo 06:40:58> load data infile "/tmp/test3.txt" into table test4 (id,test,test2,test3) set test4=now() ;
    29. Query OK, 6 rows affected (0.00 sec)
    30. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    31. admin@localhost : xiaoboluo 06:41:02> select * from test4;
    32. +----+------------------------------------------+--------+-------+---------------------+
    33. | id | test | test2 | test3 | test4 |
    34. +----+------------------------------------------+--------+-------+---------------------+
    35. | 2 | a string | 100.20 | null | 2017-05-03 18:41:02 |
    36. | 4 | a string containing a , comma | 102.20 | NULL | 2017-05-03 18:41:02 |
    37. | 6 | a string containing a " quote | 102.20 | NULL | 2017-05-03 18:41:02 |
    38. | 8 | a string containing a ", quote and comma | 102.20 | NULL | 2017-05-03 18:41:02 |
    39. | 10 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
    40. | 14 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
    41. +----+------------------------------------------+--------+-------+---------------------+
    42. 6 rows in set (0.00 sec)
  • 使用指定列名或者變數列表時SET子句的使用受以下限制:

    • SET子句中的賦值表示式賦值運算子的左側只能使用資料庫表中的真實列名
    • 您可以在SET子句中的右側使用子查詢。返回要分配給列的值的子查詢可能僅是標量子查詢。此外,在這個子查詢中您不能使用load data語句正在操作的表
    • SET子句不會處理IGNORE子句忽略的行。
    • 用固定行格式載入資料時,不能使用使用者變數,因為使用者變數值之間沒有顯示寬度
  • 如果輸入行的欄位太多(多過表中的欄位數量),則會忽略額外的欄位,並增加警告數。如果輸入行的欄位太少,那麼輸入欄位缺少的表列被設定為其預設值,在解析文字檔案時,空串欄位資料與缺少欄位值不同(空串會直接作為資料插入,而缺少欄位時,會根據欄位定義的預設值進行填充),如下:

    • 對於字串型別,列設定為空字串
    • 對於數字型別,列設定為0
    • 對於日期和時間型別,列將該型別設定為適當的“零”值

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28218939/viewspace-2139721/,如需轉載,請註明出處,否則將追究法律責任。

相關文章