US7ASCII字符集轉換為ZHS16GBK

atlantisholic發表於2011-03-11

我們知道在匯出檔案中，記錄著匯出使用的字符集id,通過檢視匯出檔案頭的第2、3個位元組，我們可以找到16進製表示的字符集ID,在Windows上，
我們可以使用UltraEdit等工具開啟dmp檔案，檢視其匯出字符集::

在Unix上我們可以通過以下命令來檢視:


cat expdat.dmp | od -x | head

US7ASCII字符集轉換為ZHS16GBK

Oracle提供標準函式，對字符集名稱及ID進行轉換:

 
SQL> select nls_charset_id('ZHS16GBK') from dual;

NLS_CHARSET_ID('ZHS16GBK')
--------------------------
                       852

1 row selected.

SQL> select nls_charset_name(852) from dual;

NLS_CHAR
--------
ZHS16GBK

1 row selected.

十進位制轉換十六進位制:

SQL> select to_char('852','xxxx') from dual;

TO_CH
-----
  354

1 row selected.

對應上面的圖中第2、3位元組，我們知道該匯出檔案字符集為ZHS16GBk.

查詢資料庫中有效的字符集可以使用以下指令碼:

 
col nls_charset_id for 9999
col nls_charset_name for a30
col hex_id for a20
select 
nls_charset_id(value) nls_charset_id,  
value nls_charset_name,
to_char(nls_charset_id(value),'xxxx') hex_id
from  v$nls_valid_values
where parameter = 'CHARACTERSET'
order by nls_charset_id(value)
/

輸出樣例如下:

 
NLS_CHARSET_ID NLS_CHARSET_NAME               HEX_ID
-------------- ------------------------------ -------------
             1 US7ASCII                           1
             2 WE8DEC                             2
             3 WE8HP                              3
             4 US8PC437                           4
             5 WE8EBCDIC37                        5
             6 WE8EBCDIC500                       6
             7 WE8EBCDIC1140                      7
             8 WE8EBCDIC285                       8
 ...................
           850 ZHS16CGB231280                   352
           851 ZHS16MACCGB231280                353
           852 ZHS16GBK                         354
           853 ZHS16DBCS                        355
           860 ZHT32EUC                         35c
           861 ZHT32SOPS                        35d
           862 ZHT16DBT                         35e
           863 ZHT32TRIS                        35f
           864 ZHT16DBCS                        360
           865 ZHT16BIG5                        361
           866 ZHT16CCDC                        362
           867 ZHT16MSWIN950                    363
           868 ZHT16HKSCS                       364
           870 AL24UTFFSS                       366
           871 UTF8                             367
           872 UTFE                             368

..................................

在很多時候，當我們進行匯入操作的時候，已經離開了源資料庫，這時如果目標資料庫的字符集和匯出檔案不一致，很多時候就需要進行特殊處理，
以下介紹幾種方法，主要以US7ASCII和ZHS16GBK為例

1．源資料庫字符集為US7ASCII，匯出檔案字符集為US7ASCII或ZHS16GBK，目標資料庫字符集為ZHS16GBK
在Oracle92中，我們發現對於這種情況，不論怎樣處理，這個匯出檔案都無法正確匯入到Oracle9i資料庫中，這可能是因為Oracle9i的編碼方案發生了較大改變。

以下是我們所做的簡單測試，其中匯出檔案命名規則為:

S-Server ，後跟Server字符集
C-client , 後跟匯出操作時客戶端字符集

匯入時客戶端字符集設定在命令列完成，限於篇幅，我們省略了部分測試過程。
對於Oracle9iR2,我們的測試結果是US7ASCII字符集，不管怎樣轉換，都無法正確匯入ZHS16GBK字符集的資料庫中。

在進行匯入操作時，如果字元不能正常轉換，Oracle資料庫會自動用一個”?”代替，也就是編碼63。

 
E:\nls2>set NLS_LANG=AMERICAN_AMERICA.US7ASCII

E:\nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygle tables=test

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:14:39 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in US7ASCII character set and AL16UTF16 NCHAR character set
import server uses ZHS16GBK character set (possible charset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:\nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:14:50 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME DUMP(NAME)
-----------------------------
???? Typ=1 Len=4: 63,63,63,63
test Typ=1 Len=4: 116,101,115,116

2 rows selected.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

E:\nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

E:\nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygle tables=test ignore=y

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:15:28 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
export client uses US7ASCII character set (possible charset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:\nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:15:34 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME  DUMP(NAME)
--------------------------------------------------------------------------------
????  Typ=1 Len=4: 63,63,63,63
test  Typ=1 Len=4: 116,101,115,116
????  Typ=1 Len=4: 63,63,63,63
test  Typ=1 Len=4: 116,101,115,116


4 rows selected.

SQL> drop table test;

Table dropped.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

E:\nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

E:\nls2>imp eygle/eygle file=Sus7ascii-Czhs16gbk.dmp fromuser=eygle touser=eygle tables=test ignore=y

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:17:21 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:\nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:17:30 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME DUMP(NAME)
----------------------------------------------
???? Typ=1 Len=4: 63,63,63,63
test Typ=1 Len=4: 116,101,115,116

2 rows selected.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

E:\nls2>set NLS_LANG=AMERICAN_AMERICA.US7ASCII

E:\nls2>imp eygle/eygle file=Sus7ascii-Czhs16gbk.dmp fromuser=eygle touser=eygle tables=test ignore=y

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:18:00 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in US7ASCII character set and AL16UTF16 NCHAR character set
import server uses ZHS16GBK character set (possible charset conversion)
export client uses ZHS16GBK character set (possible charset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:\nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:18:08 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME DUMP(NAME)
----------------------------------------
???? Typ=1 Len=4: 63,63,63,63
test Typ=1 Len=4: 116,101,115,116
???? Typ=1 Len=4: 63,63,63,63
test Typ=1 Len=4: 116,101,115,116

4 rows selected.

SQL>

對於這種情況，我們可以通過使用Oracle8i的匯出工具，設定匯出字符集為US7ASCII，匯出後修改第二、三字元，修改 0001 為
0354,這樣就可以將US7ASCII字符集的資料正確匯入到ZHS16GBK的資料庫中。

修改匯出檔案:

US7ASCII字符集轉換為ZHS16GBK

匯入修改後的匯出檔案:

 
E:\nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

E:\nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii-exp817.dmp fromuser=eygle touser=eygle tables=test

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:37:17 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V08.01.07 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
export server uses UTF8 NCHAR character set (possible ncharset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:\nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:37:23 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME     DUMP(NAME)
--------------------------------------------------------------------------------
測試       Typ=1 Len=4: 178,226,202,212
Test        Typ=1 Len=4: 116,101,115,116

2 rows selected.

SQL>

2．使用create database的方法
如果匯出檔案使用的字符集是US7ASCII，目標資料庫的字符集是ZHS16GBK,我們可以使用create database的方法來修改，具體如下:

 
SQL> col parameter for a30
SQL> col value for a30
SQL> select * from v$nls_parameters;

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_CHARACTERSET               ZHS16GBK
NLS_SORT                       BINARY
……………….

19 rows selected.

SQL> create database character set us7ascii;
create database character set us7ascii
*
ERROR at line 1:
ORA-01031: insufficient privileges


SQL> select * from v$nls_parameters;

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_CHARACTERSET               US7ASCII
NLS_SORT                       BINARY
…………..

19 rows selected.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

E:\nls2>set nls_lang=AMERICAN_AMERICA.US7ASCII

E:\nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygle

Import: Release 9.2.0.4.0 - Production on Sun Nov 2 14:53:26 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in US7ASCII character set and AL16UTF16 NCHAR character set
import server uses ZHS16GBK character set (possible charset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:\nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Sun Nov 2 14:53:35 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select * from test;

NAME
----------
測試
test

2 rows selected.

我們看到，當發出create database character set us7ascii；命令時，資料庫v$nls_parameters中的字符集設定隨之更改，該引數影響匯入程式，
更改後可以正確匯入資料，重起資料庫後，該設定恢復。

提示:v$nls_paraemters來源於x$nls_parameters,該動態效能檢視影響匯入操作；而nls_database_parameters來源於props$資料表，影響資料儲存。

3． Oracle提供的字元掃描工具csscan

我們說以上的方法只是應該在不得已的情況下使用，其本質是欺騙資料庫，強制匯入資料，可能損失後設資料。
如果要確保資料的完整性，應該使用csscan掃描資料庫，找出所有不相容的字元，然後通過編寫相應的指令碼及程式碼，在轉換之後進行更新，確保資料的正確性。
我們簡單看一下csscan的使用。

要使用csscan之前，需要以sys使用者身份建立相應資料字典物件:

 
E:\nls2>sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.4.0 - Production on Sun Nov 2 19:42:07 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select instance_name from v$intance;
select instance_name from v$intance
                          *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
penny

1 row selected.

SQL> @?/rdbms/admin/csminst.sql

User created.


Grant succeeded.

………..

這個指令碼建立相應使用者(csmig)及資料字典物件，掃描資訊會記錄在相應的資料字典表裡。

我們可以在命令列呼叫這個工具對資料庫進行掃描:

 
E:\nls2>csscan FULL=Y FROMCHAR=ZHS16GBK TOCHAR=US7ASCII LOG=US7check.log CAPTURE=Y ARRAY=1000000 PROCESS=2


Character Set Scanner v1.1 : Release 9.2.0.1.0 - Production on Sun Nov 2 20:24:45 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Username: eygle/eygle

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Enumerating tables to scan...

. process 1 scanning SYS.SOURCE$[AAAABHAABAAAAIRAAA]
. process 2 scanning SYS.ATTRIBUTE$[AAAAEoAABAAAAhZAAA]
. process 2 scanning SYS.PARAMETER$[AAAAEoAABAAAAhZAAA]
. process 2 scanning SYS.METHOD$[AAAAEoAABAAAAhZAAA]
……..
. process 2 scanning SYSTEM.DEF$_AQERROR[AAAA8fAABAAACWJAAA]
. process 1 scanning WMSYS.WM$ENV_VARS[AAABeWAABAAAFMZAAA]
………………….
. process 2 scanning SYS.UGROUP$[AAAAA5AABAAAAGpAAA]
. process 2 scanning SYS.CON$[AAAAAcAABAAAACpAAA]
. process 1 scanning SYS.FILE$[AAAAARAABAAAABxAAA]

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

然後我們可以檢查輸出的日誌來檢視資料庫掃描情況:

 
Database Scan Individual Exception Report


[Database Scan Parameters]

Parameter                      Value                                           
------------------------------ ------------------------------------------------
Scan type                      Full database                                   
Scan CHAR data?                YES                                             
Current database character set ZHS16GBK                                        
New database character set     US7ASCII                                        
Scan NCHAR data?               NO                                              
Array fetch buffer size        1000000                                         
Number of processes            2                                               
Capture convertible data?      YES                                             
------------------------------ ------------------------------------------------

[Data Dictionary individual exceptions]


[Application data individual exceptions]

User  : EYGLE
Table : TEST
Column: NAME
Type  : VARCHAR2(10)
Number of Exceptions         : 1         
Max Post Conversion Data Size: 4         

ROWID              Exception Type      Size Cell Data(first 30 bytes)     
------------------ ------------------ ----- ------------------------------
AAABpIAADAAAAAMAAA lossy conversion         測試                          
------------------ ------------------ ----- ------------------------------

不能轉換的資料將會被記錄下來，我們可以根據這些資訊在轉換之後，對資料進行相應的更新，確保轉換無誤。

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/23071790/viewspace-689177/，如需轉載，請註明出處，否則將追究法律責任。

oracle字符集轉換(ZHS16GBK轉AL32UTF8)
2017-03-22
Oracle
修改Oracle字符集為ZHS16GBK
2024-08-20
Oracle
【exp/imp】將US7ASCII字符集的dmp檔案匯入到ZHS16GBK字符集的資料庫中
2017-05-09
ASCII資料庫
更改Oracle字符集：把字符集ZHS16GBK換成UTF8
2018-06-27
Oracle
Oracle 11g 修改字符集為 ZHS16GBK
2013-11-09
Oracle
字符集轉換
2017-08-30
ZHS16CGB231280 --> ZHS16GBK 字符集轉換，exp/imp 注意事項
2008-05-08
資料庫的編碼淺談（ZHS16GBK與US7ASCII）
2007-11-15
資料庫ASCII
ORACLE11G中us7ascii 字符集匯出檔案匯入zhs16gbk 庫中亂碼問題處理
2013-02-23
OracleASCII
imp/exp 字符集轉換
2007-11-22
字符集合轉換問題
2010-01-07
ZHS16GBK轉換成AL32UTF8
2022-02-21
字符集為ZHS16GBK的資料庫匯入到字符集為AL32UTF8的資料庫
2011-03-11
資料庫
Linux下轉換字符集(UTF8轉換)
2007-05-25
Linux
oracle字符集轉換分析工具
2009-03-04
Oracle
sybase中的字符集轉換
2011-05-12
【Mysql】iconv 轉換字符集
2011-10-27
MySql
再論字符集轉換（二）
2008-01-28
遷移資料時oracle字符集的轉換遷移資料時oracle字符集的轉換
2012-09-11
Oracle
CSSCAN掃描字符集轉換耗損
2015-02-04
CSS
MySQL修改字符集(mysqldump轉換全庫)
2014-10-30
MySql
巧妙轉換ORACLE資料庫字符集
2007-07-04
Oracle資料庫
oracle資料庫字符集的轉換
2011-10-12
Oracle資料庫
測試用 oracle11g更改字符集AL32UTF8為ZHS16GBK
2013-05-10
Oracle
從utf8轉到US7ASCII
2010-09-15
ASCII
java json字串轉換為物件，轉換為list
2014-04-16
JavaJSON字串物件
Oracle字符集 ZHS16GBK VS WE8ISO8859P1
2016-05-31
Oracle
javascript 字元轉換為ascii碼，ascii碼轉換為字元
2016-11-15
JavaScript字元ASCII
Activity轉換為View和把圖片轉換為View
2017-06-09
View
unix時間轉換為datetimedatetime轉換為unixtime
2018-07-06
Oracle 11g rac資料庫字符集轉換
2014-02-11
Oracle資料庫
ora2pg 字符集轉換問題總結
2011-12-27
使用Ccscan進行資料字符集轉換驗證（上）
2015-09-02
使用Ccscan進行資料字符集轉換驗證（下）
2015-09-04
MySQL字符集設定及字元轉換（latin1轉utf8）
2015-07-02
MySql字元
轉換RM為MP3(轉)
2007-08-11
轉換成為整數
2020-12-30
數值轉換為字元
2014-09-26
字元

US7ASCII字符集轉換為ZHS16GBK

相關文章