itpub網友問題之AL32UTF8與ZHS16GBK 2種資料庫字符集database characterset

wisdomone1發表於2015-11-30

itpub論壇問題

    原問題:http://www.itpub.net/thread-1943065-1-4.html
   安裝了資料庫11.2.0.4,誤將字符集設定為AL32UTF8,本來應該是ZHS16GBK;透過應用軟體錄入了一些資料,直到一個視窗報錯才發現字符集選擇錯了。
因為AL32UTF8是ZHS16GBK的超集,所以直接將字符集轉換為ZHS16GBK後,應用軟體進去後,所有中文都變成了亂碼,看來直接轉換字符集這條路走不通了。


現在想問,有什麼辦法能將AL32UTF8庫裡資料匯出,匯入到新安裝的ZHS16GBK庫中?
在AL32UTF8庫匯入前將匯入環境字符集設定為ZHS16GBK?然後匯出?

分析思路



擴充套件問題

1,對於資料庫字符集沒有完全理解,還是浮於表面
2,有空多看官方手冊 Oracle? Database Globalization Support Guide



結論

1,不同的地理位置,適用的資料庫字符集不同,引申含義即比如我們是中國就應採用這個地理位置的字符集
2,如果原資料庫字符集與目標資料庫字符集不是父子關係,執行轉化資料庫字符集會提示如下錯誤
SQL> ALTER DATABASE CHARACTER SET ZHS16GBK;
ALTER DATABASE CHARACTER SET ZHS16GBK
*
ERROR at line 1:
ORA-12712: new character set must be a superset of old character set


3,父子集資料庫字符集轉換不會造成資料丟失及儲存空間增大2種問題,中文字元不會在轉換顯示亂碼,比如:?之類的
4,字符集AL32UTF8與ZHS16GBK不是父子集資料庫字符集關係,所以肯定不能轉換,也就回答了壇友的問題了,在字符集轉換後肯定會發生亂碼,只有父子集資料庫字符集方可正常轉換
    所以一定要在建立資料庫前把資料庫字符集考慮好,一旦有了業務資料,再轉換就風險極大了
5, convert函式可以轉換指定字元由某資料庫字符集轉換為另一種資料庫字符集,用這個函式我認為最大價值在於預知某些字元是否可以正常轉換,即產生資料損失即產生亂碼
   也可以發現空間增加的問題


6, v$nls_valid_values可以獲取資料庫字符集的資訊,共計 499種字符集
7,資料庫字符集編碼方案共計2大類:
   單位元組及多位元組,其中又細分很多子類,請見下測試


測試

---檢視與nls相關的字典
SQL> select table_name from dict where lower(table_name) like '%nls%';


TABLE_NAME
------------------------------------------------------------
NLS_DATABASE_PARAMETERS
NLS_INSTANCE_PARAMETERS
NLS_SESSION_PARAMETERS
V$NLS_PARAMETERS
V$NLS_VALID_VALUES
GV$NLS_PARAMETERS
GV$NLS_VALID_VALUES


7 rows selected.


SQL> desc nls_database_parameters;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PARAMETER                                 NOT NULL VARCHAR2(30)
 VALUE                                              VARCHAR2(40)


SQL> select count(*) from nls_database_parameters;


  COUNT(*)
----------
        20


當前資料庫字符集是WE8MSWIN1252
SQL> col parameter for a30
SQL> col value for a30
SQL> select parameter,value from nls_database_parameters;


PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               WE8MSWIN1252
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM


PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE
NLS_NCHAR_CHARACTERSET         AL16UTF16
NLS_RDBMS_VERSION              11.2.0.1.0


20 rows selected.




---官方手冊說,如果資料庫端字符集不能包括了客戶端的字符集,則會發生資料丟失,即產生亂碼,說白了就是用亂碼替換識別到的字符集,所以一定要確認資料庫字符集是客戶端
字符集的父集
Figure 2-6 shows that data loss occurs when the database character set does not include all of the characters in the client character set. 
The database character set is US7ASCII. The client's character set is WE8MSWIN1252, and the language used by the client is German. When the 
client inserts a string that contains ?, the database replaces ? with ?, resulting in lost data.




---基於上述資訊,查閱如下可以獲取官方推薦的基於不同全球的資料庫字符集


Oracle? Database Globalization Support Guide
11g Release 2 (11.2)
Part Number E10729-07


A Locale Data


Recommended Database Character Sets






由這裡可見我當前選擇的資料庫字符集WE8MSWIN1252 明顯不正常的,因為這個字符集適用於歐洲,或者我應該選擇支援粒度更好的AL32UTF8

再摘錄下父子集資料庫字符集列表(源自官方手冊)






---基於問題說的構建一個字符集AL32UTF8的資料庫
SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                                          VALUE
-------------------------------------------------- ------------------------------
NLS_CHARACTERSET                                   AL32UTF8


--插入中文資料
SQL> conn scott/system
Connected.
SQL> create table t_charset(a varchar2(100));


Table created.


SQL> insert into t_charset values('我們');


1 row created.


SQL> commit;


Commit complete.


SQL> col a for a30
SQL> select a,dump(a) from t_charset;


A                              DUMP(A)
------------------------------ --------------------------------------------------
我們                           Typ=1 Len=9: 233,142,180,230,136,156,230,187,145




修改資料庫字符集AL32UTF8 為ZHS16GBK ,驗證ITPUB網友的問題


SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.


Total System Global Area 1653518336 bytes
Fixed Size                  2213896 bytes
Variable Size             956303352 bytes
Database Buffers          687865856 bytes
Redo Buffers                7135232 bytes
Database mounted.
SQL> ALTER SYSTEM ENABLE RESTRICTED SESSION;


System altered.


SQL> ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;


System altered.


SQL> ALTER SYSTEM SET AQ_TM_PROCESSES=0;


System altered.


SQL> alter database open;


Database altered.


這裡也可以再次驗證ZHS16GBK不是AL32UTF8的超集
SQL> ALTER DATABASE CHARACTER SET ZHS16GBK;
ALTER DATABASE CHARACTER SET ZHS16GBK
*
ERROR at line 1:
ORA-12712: new character set must be a superset of old character set




SQL> ALTER DATABASE character set INTERNAL_USE ZHS16GBK;


Database altered.






SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET               ZHS16GBK


[oracle@seconary ~]$ sqlplus scott/system


SQL*Plus: Release 11.2.0.1.0 Production on Sun Nov 29 06:26:20 2015


Copyright (c) 1982, 2009, Oracle.  All rights reserved.


ERROR:
ORA-01035: ORACLE only available to users with RESTRICTED SESSION privilege




SQL> alter system disable restricted session;


System altered.




檢查中文字元報錯
SQL> conn scott/system
Connected.
SQL> select a,dump(a) from t_charset;
SP2-0784: Invalid or incomplete character beginning 0x91 returned


SQL> select count(*) from t_charset;


  COUNT(*)
----------
         1




SQL> select dump(a) from t_charset;


DUMP(A)
--------------------------------------------------------------------------------
Typ=1 Len=9: 233,142,180,230,136,156,230,187,145


SQL> insert into t_charset values('別人');


1 row created.


SQL> commit;


Commit complete.


SQL> select * from t_charset;


A
--------------------------------------------------------------------------------
別人
SP2-0784: Invalid or incomplete character beginning 0x91 returned


--可見zhs16gbk每個中文3個位元組,而AL32UTF8是4.5個位元組,而且可見如果舊資料庫字符集和新字符集不是父子關係,可能會產生亂碼
SQL> select a,dump(a) from t_charset;


A                                                  DUMP(A)
-------------------------------------------------- --------------------------------------------------
別人                                              Typ=1 Len=6: 229,136,171,228,186,186
我們                                               Typ=1 Len=6: 230,136,145,228,187,172
SP2-0784: Invalid or incomplete character beginning 0x91 returned






再把字符集恢復回AL32UTF8
SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                                          VALUE
-------------------------------------------------- --------------------------------------------------
NLS_CHARACTERSET                                   AL32UTF8




經查官方手冊,資料庫字符集轉換有如下2種風險:
Potential data loss
Increased overhead




SQL> truncate table t_charset;


Table truncated.


SQL> select a,dump(a) from t_charset;


no rows selected




測試下父子集資料庫字符集切換基於中文,會不會產生亂碼
SQL> ALTER DATABASE character set INTERNAL_USE  US7ASCII;


Database altered.




SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                                          VALUE
-------------------------------------------------- --------------------------------------------------
NLS_CHARACTERSET                                   US7ASCII




可見us7ascii資料庫字符集不能正常顯示中文字元,一查官方手冊,原來這個字符集是隸屬於歐洲,當然不能顯示亞洲的字元
SQL> insert into t_charset values('一');


1 row created.


SQL> commit;


Commit complete.




SQL> select a,dump(a) from t_charset;


A          DUMP(A)
---------- ------------------------------
??         Typ=1 Len=2: 63,63




SQL> truncate table t_charset;


Table truncated.


測一個亞洲中文的字符集,也是顯示不行中文 CGB2312-80 16-bit Simplified Chinese
SQL> ALTER DATABASE character set INTERNAL_USE  ZHS16CGB231280;


SQL> insert into t_charset values('一');


1 row created.


SQL> commit;


Commit complete.


SQL> select a,dump(a) from t_charset;


A          DUMP(A)
---------- ------------------------------
?          Typ=1 Len=3: 228,184,63


Database altered.


繼續測試下UTF8資料庫字符集,可見如果由子集轉化為父集資料庫字符集,不會發生資料丟失及空間增加的2種問題
SQL> ALTER DATABASE character set INTERNAL_USE UTF8;


Database altered.


SQL> insert into t_charset values('夥伴');


1 row created.


SQL> commit;


Commit complete.


SQL> select a,dump(a) from t_charset;


A          DUMP(A)
---------- --------------------------------------------------
夥伴       Typ=1 Len=9: 230,181,188,230,172,142,229,141,179






SQL> conn /as sysdba
Connected.
SQL> alter system enable restricted session;


System altered.


SQL> ALTER DATABASE character set AL32UTF8;


Database altered.


SQL> alter system disable restricted session;


System altered.


SQL> conn scott/system
Connected.
SQL> select a,dump(a) from t_charset;


A          DUMP(A)
---------- --------------------------------------------------
一         Typ=1 Len=6: 230,182,147,226,130,172
夥伴       Typ=1 Len=9: 230,181,188,230,172,142,229,141,179




--檢視所有的資料庫字符集
SQL> select count(*) from v$nls_valid_values;


  COUNT(*)
----------
       499




SQL> select parameter,value from v$nls_valid_values where parameter='CHARACTERSET' and VALUE  like '%ZH%';


PARAMETER                                          VALUE
-------------------------------------------------- --------------------------------------------------
CHARACTERSET                                       ZHS16CGB231280
CHARACTERSET                                       ZHS16MACCGB231280
CHARACTERSET                                       ZHS16GBK
中間略
CHARACTERSET                                       ZHT16BIG5FIXED


22 rows selected.




SQL> select parameter,value from v$nls_valid_values where parameter='CHARACTERSET' and VALUE  like '%AL%';


PARAMETER                                          VALUE
-------------------------------------------------- --------------------------------------------------
CHARACTERSET                                       AL24UTFFSS
CHARACTERSET                                       AL32UTF8
CHARACTERSET                                       




SQL> select convert('ABC','AL16UTF16','UTF8'),length(convert('ABC','AL16UTF16','UTF8')) from dual;


CONVERT('ABC LENGTH(CONVERT('ABC','AL16UTF16','UTF8'))
------------ -----------------------------------------
 A B C                                               6




SQL> select convert('a123','UTF8','UTF8') from dual;


CONVERT(
--------
a123


SQL> select length('12'),length(convert('12','AL16UTF16','UTF8')) from dual;


LENGTH('12') LENGTH(CONVERT('12','AL16UTF16','UTF8'))
------------ ----------------------------------------
           2                                        4

資料庫字符集的編碼規則 共計2大類(單位元組雖然效率高且空間佔用小,但表示的字元有限):
Single-Byte Encoding Schemes


     7-bit encoding schemes


     8-bit encoding schemes


Multibyte Encoding Schemes


    Fixed-width multibyte encoding schemes


   Variable-width multibyte encoding schemes


   Shift-sensitive variable-width multibyte encoding schemes


Naming Convention for Oracle Database Character Sets
<region><number of bits used to represent a character><standard character set name>[S|C]


The parts of the names that appear between angle brackets are concatenated. 
The optional S or C is used to differentiate character sets that can be used only on the server (S) or only on the client (C).



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9240380/viewspace-1849340/,如需轉載,請註明出處,否則將追究法律責任。

相關文章