[轉]SqlServe到PG遷移錯誤:無效的編碼序列”UTF8″:0x00

德哥發表於2018-02-02

背景

原文

http://www.codeweblog.com/sqlserve%E5%88%B0pg%E8%BF%81%E7%A7%BB%E9%94%99%E8%AF%AF-%E6%97%A0%E6%95%88%E7%9A%84%E7%BC%96%E7%A0%81%E5%BA%8F%E5%88%97-utf8-0x00/

環境：

sqlserver2008 R2 （winXP）  
  
postgresql9.3.4 （win7-64bit）

1. 通過java像PostgreSQL提交批量 insert（或者普通insert或者執行copy）：

錯誤：

java.sql.BatchUpdateException: 批次處理被中止，呼叫 getNextException 以取得原因。

解決：在catch模組列印出getNextException的內容，就能知道具體的原因是什麼了。

catch (ClassNotFoundException | SQLException ex)  
{  
       System.out.println("Error: " + ex.getMessage());  
       ex.printStackTrace(System.out);  
       if (ex instanceof BatchUpdateException)  
       {  
             BatchUpdateException bex = (BatchUpdateException) ex;  
             bex.getNextException().printStackTrace(System.out);  
       }  
}

2. getNextException抓到的具體錯誤資訊是：org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00  
 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2198)  
 at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1927)  
 at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:405)  
 at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2892)  
 at JDBCTest.ConnTest.main(ConnTest.java:154)

3. 經檢視sqlsever和PG的字元編碼:

sql server的字符集編碼是GBK（對於sqlserver檢視encoding很複雜，我看參考了這個方法 http://bbs.csdn.net/topics/190034619 ）

PostgreSQL的字符集編碼是UTF8（對於PG來說很容易檢視encoding，看一下屬性中的encoding就可以）

這樣看來兩個資料庫的字符集編碼不同，但是PG的伺服器端不支援字符集GBK，也就是建立資料庫時不能指定encoding=GBK，PG中可替代GBK的字符集EUC_CN也無法使用。只有客戶端可以支援GBK，但該設定只在一個連線中有效，故在插入資料之前執行 ppstmt =conn.prepareStatement(“set client_encoding=GBK;”);，此時又產生了如下錯誤：

org.postgresql.util.PSQLException: The server`s client_encoding parameter was changed to GBK. The JDBC driver requires client_encoding to be UTF8 for correct operation.  
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1966)  
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)  
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561)  
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:419)  
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:365)  
    at JDBCTest.ConnTest.main(ConnTest.java:56)

這樣看來，sqlserver和PG的字符集沒法實現完全統一。

4. 重新回到invalid byte sequence for encoding “UTF8”: 0x00。

我在postgresql的社群郵件列表裡找到了這個問題：http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-quot-UTF8-quot-0x00-td2172080.html

http://postgresql.1045698.n5.nabble.com/supporting-0x00-from-client-Unicode-JDBC-td1969492.html#a1969493

通過郵件列表中提到的內容以及個人多次測試報錯表的欄位，發現導致問題的原因就是字元型別欄位值中含有空字元，這是一個結束符。（Unicode 查詢工具 http://www.scarfboy.com/coding/unicode-tool?s=U%2B2323）。

5. 問題重現：

在sqlserver中：

create table test_varchar3 (id int,name varchar(23));  
insert into test_varchar3  values (1,`ddd`+CHAR(0)+`aaa`);  
insert into test_varchar3  values (2,`ddd`+CHAR(0));  
insert into test_varchar3  values (3,CHAR(0)+`aaa`);  
insert into test_varchar3  values (4,`aaa`); ---注意此種的僅僅是字串並不是空字元``  
select * from test_varchar3;

將該表從sqlserver遷移到pg：

通過jdbc取到的(其中空字元無法表示，此處用空格替代)：

id,name,name_length  
  
1,ddd aaa,length=7  
  
2,ddd ,length =4  
  
3, aaa,length=4  
  
4,aaa,length=5

錯誤重現：invalid byte sequence for encoding "UTF8": 0x00。

解決方法：在向pg中copy或者insert資料提交之前，現將其中的字元型別欄位值中的空字元取掉即可成功執行。

比如：str.replaceAll("", ""); 或者str.replace("", "");

6. 擴充套件問題：

sqlserver中字元型別欄位值，當插入的字元值中帶有空字元的時候，sqlserver客戶端顯示時會去掉空字元之後的字元，但是在查詢中要查到該條資料還需要匹配之後的內容：

create table test_varchar3 (id int,name varchar(23));  
insert into test_varchar3  values (1,`ddd`+CHAR(0)+`aaa`);  
insert into test_varchar3  values (2,`ddd`+CHAR(0));  
insert into test_varchar3  values (3,CHAR(0)+`aaa`);  
insert into test_varchar3  values (4,`aaa`);  
select * from test_varchar3;  
select * from test_varchar3 where name=`ddd`;  
id | name  
-------------  
2  |  ddd  
select * from test_varchar3 where name=`ddd`+CHAR(0);  
id | name  
-------------  
2  |  ddd  
select * from test_varchar3 where name=`ddd`+CHAR(0)+`aaa`;  
id | name  
-------------  
1  |  ddd  
select * from test_varchar3 where name=CHAR(0);  
id | name  
------------  
  
select * from test_varchar3 where name=CHAR(0)+`aaa`;  
id | name  
-------------  
3  |

關於如何從sqlsever資料庫中去掉子賦值中的空值結束符，可以參考（待補充）：

http://stackoverflow.com/questions/3533320/sql-server-remove-end-string-character-0-from-data

所以說盡管空值終止符“是空的不顯示的，但是他會對字元型別欄位值及其查詢等造成影響，所以使用sqlsever最好不要向字元型別欄位中插入帶有空值終止符“的字元或字串。

關於空值終止符“或者其他控制符的詳細介紹，請參考MS SQL Server官方文件：

http://msdn.microsoft.com/en-us/library/ms191485.aspx

資料遷移無法新增外來鍵約束，錯誤程式碼 1215
2018-06-29
utf8的編碼原理
2012-11-29
遷移過程中出現的open failed錯誤
2009-02-23
AI
VC6到VC2010，專案遷移錯誤
2016-06-16
遷移linux的XWindows程式到本地(轉)
2007-08-15
LinuxWindows
LightDB/Postgres 使用ora2pg遷移Oracle到LightDB/Postgres
2022-06-28
Oracle
經典的編譯錯誤 (轉)
2008-03-19
編譯
從Windows到Mac遷移Android專案出現錯誤解決方案
2015-10-03
WindowsMacAndroid
編寫優質無錯程式碼(2) (轉)
2008-01-25
編寫優質無錯程式碼(3) (轉)
2008-01-25
編寫優質無錯程式碼(1) (轉)
2008-01-25
Azure ASM到ARM遷移（三） Reserved IP的遷移
2017-09-06
ASM
Oracle將字元編碼從GBK轉到UTF8，如何操作比較穩妥？
2014-02-19
Oracle字元
編譯無效物件
2015-08-17
編譯物件
(轉)編譯Oracle中無效的物件的N中方法
2011-03-15
編譯Oracle物件
執行無法解決的編譯錯誤
2009-09-25
編譯
從MySQL到ORACLE程式遷移的注意事項(轉)
2007-08-16
MySqlOracle
【SYNONYM】ORA-00980錯誤處理暨刪除無效同名的SQL生成指令碼
2009-11-04
SQL指令碼
C++ url中文編碼處理編碼轉為UTF8 CChineseCode下載
2010-05-10
C++
PG 資料庫 DTS 遷移需要注意的事項:
2021-11-09
資料庫
oracle編譯無效物件
2012-07-20
Oracle編譯物件
遷移資料庫的檔案到不同路徑(轉)
2019-04-01
資料庫
Python 編碼錯誤的本質原因
2017-06-15
Python
用Javascript實現UTF8編碼轉換成gb2312編碼
2023-03-07
JavaScript
遷移學習時間序列分類
2019-04-08
遷移學習
如何從SVN遷移原始碼到Git倉庫
2018-07-03
原始碼Git
trac 平臺從 PG 資料庫轉到sqlite 的指令碼。
2013-03-28
資料庫SQLite指令碼
解決遷移資料庫錯誤，索引長度過長
2020-07-11
資料庫索引
Oracle查詢並批量編譯無效物件指令碼
2015-08-03
Oracle編譯物件指令碼
Python遷移編輯器到Visual Studio Code 的步驟
2018-12-19
Python
Linux下批量把GDK編碼的文章轉化為UTF-8編碼的文章以及“iconv: 未知 xxx 處的非法輸入序列”錯誤處理
2014-08-20
Linux
從utf8轉到US7ASCII
2010-09-15
ASCII
無效數字問題：Oracle-MySQL-PG大不同
2023-02-09
OracleMySql
如何修復“包 ‘%@’ 丟失或無效”macOS Monterey 升級錯誤
2021-11-01
Mac
編譯PHP的錯誤
2017-11-15
編譯PHP
轉資料遷移
2016-04-11
Larabel遷移檔案時報SQLSTATE[42000]錯誤的解決方法
2021-11-16
SQL
遷移表到新的表空間
2015-03-12

[轉]SqlServe到PG遷移錯誤:無效的編碼序列”UTF8″:0x00

標籤

背景

原文

相關文章