oracle資料庫字符集characterset遷移及變更系列一

wisdomone1發表於2015-11-30

背景

   oracle資料庫字符集和應用資料密切相關,資料庫遷移也會涉及到資料庫字符集的轉換,大家常常聽到的亂碼之類的,匯出匯入表發生列長度不足的錯誤,也和資料庫字符集不無關係,
本文我沿襲前文:http://blog.itpub.net/9240380/viewspace-1849340/ , 
itpub網友問題之AL32UTF8與ZHS16GBK 2種資料庫字符集database characterset,繼續學習資料庫字符集的相關知識。


結論

1,表名長度最大為30個位元組
2,convert函式確實是個利器,可以驗證2種字符集是否相容,達到選擇目標資料庫字符集的作用
3,nls_lang是OS層面的環境變數,其最後部分指定客戶端採用的資料庫字符集,控制EXP及IMP和EXPDP及IMPDP的字符集
4,如果NLS_LANG及資料庫字符集不一致,會提示可能產生字符集變更的資訊,這個可能會引發資料損失
5,千萬別太輕信網上的,執行ALERT DATABASE CHARECTERSET INTERNAL_USE變更 資料庫字符集,這樣會引發隱性或未知的風險
6,nls_length_semantics預設值為BYTE,值也可為CHAR,ORACLE不建議採用後者,引發會引發效能問題以及執行時間錯誤,可能也會產生BUFFER OVERFLOW




測試


SQL> select * from v$version where rownum=1;


BANNER
----------------------------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production


SQL> create table t_testlength(a int);


Table created.


SQL> select length(object_name) from user_objects where  lower(object_name)='t_testlength';


LENGTH(OBJECT_NAME)
-------------------
                 12


SQL> alter table t_testlength rename to t_testlength_testlength_testlengt;
alter table t_testlength rename to t_testlength_testlength_testlengt
                                   *
ERROR at line 1:
ORA-00972: identifier is too long




SQL> select length('t_testlength_testlength_testlengt') from dual;


LENGTH('T_TESTLENGTH_TESTLENGTH_TESTLENGT')
-------------------------------------------
                                         33


SQL> select length('t_testlength_testlength_testle') from dual;


LENGTH('T_TESTLENGTH_TESTLENGTH_TESTLE')
----------------------------------------
                                      30


可見表名長度最長為30個位元組的長度
SQL> alter table t_testlength rename to t_testlength_testlength_testle;


Table altered.


SQL> alter table t_testlength_testlength_testle rename to t_testlength_testlength_testle1;
alter table t_testlength_testlength_testle rename to t_testlength_testlength_testle1
                                                     *
ERROR at line 1:
ORA-00972: identifier is too long                                                          




--當前資料庫字符集為AL32UTF8
SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET               AL32UTF8


--長度刻度為位元組,還有個值也可以為CHAR
SQL> show parameter nls_length_semantics


NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
nls_length_semantics                 string                 BYTE


--檢視與中文相關的資料庫字符集
SQL> select distinct parameter from v$nls_valid_values;


PARAMETER
------------------------------
CHARACTERSET
SORT
TERRITORY
LANGUAGE


SQL> select parameter,value from v$nls_valid_values where parameter='CHARACTERSET' and VALUE  like '%ZH%';


PARAMETER                      VALUE
------------------------------ ------------------------------
CHARACTERSET                   ZHS16CGB231280
CHARACTERSET                   ZHS16MACCGB231280
CHARACTERSET                   ZHS16GBK
CHARACTERSET                   ZHS16DBCS
CHARACTERSET                   ZHS32GB18030
CHARACTERSET                   ZHT32EUC
CHARACTERSET                   ZHT32SOPS
CHARACTERSET                   ZHT16DBT
CHARACTERSET                   ZHT32TRIS
CHARACTERSET                   ZHT16DBCS
CHARACTERSET                   ZHT16BIG5


PARAMETER                      VALUE
------------------------------ ------------------------------
CHARACTERSET                   ZHT16CCDC
CHARACTERSET                   ZHT16MSWIN950
CHARACTERSET                   ZHT16HKSCS
CHARACTERSET                   ZHT16HKSCS31
CHARACTERSET                   ZHS16CGB231280FIXED
CHARACTERSET                   ZHS16GBKFIXED
CHARACTERSET                   ZHS16DBCSFIXED
CHARACTERSET                   ZHT32EUCFIXED
CHARACTERSET                   ZHT32TRISFIXED
CHARACTERSET                   ZHT16DBCSFIXED
CHARACTERSET                   ZHT16BIG5FIXED


22 rows selected.


好像英文字元在2種字符集佔用空間是一樣的
SQL> select 'abc',dump('abc'),dump(convert('abc','ZHS16GBK','AL32UTF8')) from dual;


'ABC'  DUMP('ABC')                                  DUMP(CONVERT('ABC','ZHS16GBK','AL32UTF8'))
------ -------------------------------------------- ------------------------------------------
abc    Typ=96 Len=3: 97,98,99                       Typ=1 Len=3: 97,98,99


SQL> insert into t_charset values('我們');


1 row created.


SQL> commit;


Commit complete.




SQL> select a,dump(a) from t_charset;


A                              DUMP(A)
------------------------------ --------------------------------------------------
我們                           Typ=1 Len=9: 233,142,180,230,136,156,230,187,145


可見AL32UTF8不能正常轉化ZHS16GBK,中文會出現問題,且前者佔用位元組長度為9個,而後者變成6,造成資料損失
SQL> select a,dump(a),convert(a,'ZHS16GBK','AL32UTF8') mig_char,dump(convert(a,'ZHS16GBK','AL32UTF8')) after_dump from t_charset;


A                    DUMP(A)                                            MIG_CHAR                       AFTER_DUMP
-------------------- -------------------------------------------------- ------------------------------ --------------------------------------------------
我們                 Typ=1 Len=9: 233,142,180,230,136,156,230,187,145                        Typ=1 Len=6: 230,136,145,228,187,172


看到沒,如果從AL32UTF8轉化為UTF8,不會造成資料損失,中文可以正常顯示,可見CONVERT函式是個利器,對於字符集轉化
SQL> select a,dump(a),convert(a,'UTF8','AL32UTF8') mig_char,dump(convert(a,'UTF8','AL32UTF8')) after_dump from t_charset;


A                    DUMP(A)                                            MIG_CHAR                       AFTER_DUMP
-------------------- -------------------------------------------------- ------------------------------ --------------------------------------------------
我們                 Typ=1 Len=9: 233,142,180,230,136,156,230,187,145   我們                           Typ=1 Len=9: 233,142,180,230,136,156,230,187,145


SQL> desc database_properties;
 Name              Null?    Type
 ----------------- -------- ------------
 PROPERTY_NAME     NOT NULL VARCHAR2(30)
 PROPERTY_VALUE             VARCHAR2(400
                            0)
 DESCRIPTION                VARCHAR2(400
                            0)


SQL> select count(*) from database_properties;


  COUNT(*)
----------
        36


SQL> select distinct property_name from database_properties;


PROPERTY_NAME
------------------------------
Flashback Timestamp TimeZone
DST_UPGRADE_STATE
NLS_CURRENCY
NLS_ISO_CURRENCY
NLS_DUAL_CURRENCY
DBTIMEZONE
DEFAULT_EDITION
NLS_CALENDAR
NLS_TIMESTAMP_FORMAT
EXPORT_VIEWS_VERSION
NO_USERID_VERIFIER_SALT


PROPERTY_NAME
------------------------------
DST_SECONDARY_TT_VERSION
NLS_TERRITORY
NLS_DATE_LANGUAGE
NLS_LENGTH_SEMANTICS
WORKLOAD_REPLAY_MODE
DEFAULT_PERMANENT_TABLESPACE
DEFAULT_TBS_TYPE
NLS_LANGUAGE
NLS_TIMESTAMP_TZ_FORMAT
NLS_NCHAR_CHARACTERSET
NLS_RDBMS_VERSION


PROPERTY_NAME
------------------------------
NLS_NUMERIC_CHARACTERS
NLS_TIME_FORMAT
NLS_NCHAR_CONV_EXCP
GLOBAL_DB_NAME
DEFAULT_TEMP_TABLESPACE
NLS_DATE_FORMAT
NLS_COMP
DICT.BASE
TDE_MASTER_KEY_ID
DST_PRIMARY_TT_VERSION
NLS_CHARACTERSET


PROPERTY_NAME
------------------------------
NLS_SORT
WORKLOAD_CAPTURE_MODE
NLS_TIME_TZ_FORMAT


36 rows selected.




SQL> select property_name,property_value from database_properties where property_name='NLS_LENGTH_SEMANTICS';


PROPERTY_NAME                  PROPERTY_VALUE
------------------------------ --------------------------------------------------
NLS_LENGTH_SEMANTICS           BYTE


--查官方手冊,梳理上述引數nls_length_semantics的含義
Oracle? Database Globalization Support Guide 之
3 Setting Up a Globalization Support Environment


1,nls_length_semantics的可用值為:byte,char,預設值為byte
2,前值適用於單位元組資料庫字符集編碼規則,後者適用於多位元組資料庫字符集編碼規則


3,char,varchar2,long資料型別可以選取byte,也可以選擇char
4,nchar,nvarchar2,clob,nclob,只能選用char
5,如果變更nls_length_semantics不會影響已存在的資料


6,nls_length_semantics可以在資料庫級,例項級,會話級進行調整


7,sys使用者下的資料只會採用byte


8,如果在客戶端顯示指定與伺服器端不同的值,錄入資料以客戶端為準


9,為了資料相容,ORACLE不建議配置值為CHAR,因為會導致產生很多問題:執行時錯誤,BUFFER OVERFLOW




---測試下zhs16gbk儲存固定寬度的中文字元,然後轉化為AL32UTF8
SQL> conn /as sysdba
Connected.
SQL> alter system enable restricted session;


System altered.


SQL> ALTER DATABASE character set INTERNAL_USE ZHS16GBK;


Database altered


SQL> alter system disable restricted session;


System altered.


SQL> conn scott/system
Connected.




SQL> create table t_charset(a char(10));


Table created.


SQL> insert into t_charset values('夥伴');


1 row created.


SQL> commit;


Commit complete.
--可見佔用10個位元組
SQL> select a,dump(a) from t_charset;


A                    DUMP(A)
-------------------- --------------------------------------------------
夥伴                 Typ=96 Len=10: 228,188,153,228,188,180,32,32,32,32


SQL> conn /as sysdba
Connected.
SQL> alter system enable restricted session;


System altered.


SQL> alter database character set INTERNAL_USE AL32UTF8;


Database altered.




SQL> alter system disable restricted session;


System altered.


--看到沒,轉化AL32UTF8,中文顯示不出來
SQL> conn scott/system
Connected.
SQL> select a,dump(a) from t_charset;


A                    DUMP(A)
-------------------- --------------------------------------------------
           Typ=96 Len=10: 228,188,153,228,188,180,32,32,32,32




SQL> insert into t_charset values('夥伴');


1 row created.


SQL> commit;


Commit complete.


--可見基於同樣的中文,ZHS16GBK與AL32UTF8,編碼規則發生了變化,雖然佔用的位元組數相同,當然顯示不出來了
SQL> select a,dump(a) from t_charset;


A                    DUMP(A)
-------------------- ----------------------------------------------------------------------------------------------------
           Typ=96 Len=10: 228,188,153,228,188,180,32,32,32,32
夥伴                 Typ=96 Len=10: 230,181,188,230,172,142,229,141,179,32










----在al32utf8匯出測試表
可見匯出失敗了(這裡失敗是因為表中同時儲存2種不同字符集的表資料)
[oracle@seconary ~]$ exp file=exp_t_charset.dmp userid=scott  tables=t_charset


Export: Release 11.2.0.1.0 - Production on Sun Nov 29 16:27:59 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)


About to export specified tables via Conventional Path ...
. . exporting table                      T_CHARSET
EXP-00008: ORACLE error 6552 encountered
ORA-06552: PL/SQL: Compilation unit analysis terminated
ORA-06553: PLS-553: character set name is not recognized
Export terminated successfully with warnings.
[oracle@seconary ~]$ 






---刪除測試表,然後調整資料庫字元,再匯入測試表
SQL> drop table t_charset purge;


Table dropped.


SQL> conn /as sysdba
Connected.
SQL> alter system enable restricted session;


System altered.


SQL> alter database character set INTERNAL_USE  ZHS16GBK;


Database altered.


SQL> alter system disable restricted session;


System altered.


[oracle@seconary ~]$ imp file=exp_t_charset.dmp userid=scott  tables=t_charset


Import: Release 11.2.0.1.0 - Production on Sun Nov 29 16:31:43 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


Export file created by EXPORT:V11.02.00 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
. importing SCOTT's objects into SCOTT
Import terminated successfully without warnings.




---我們重新測試上述的例子,清空測試表
---al32utf8
SQL> conn scott/system
Connected.
SQL> create table t_charset(a char(10));


Table created.


SQL> insert into t_charset values('夥伴');


1 row created.


SQL> commit;


Commit complete.


SQL> select a,dump(a) from t_charset;


A                    DUMP(A)
-------------------- ----------------------------------------------------------------------
夥伴                 Typ=96 Len=10: 230,181,188,230,172,142,229,141,179,32


可見現在表中1種資料庫字符集,EXP還是報錯,說明用alter database character set internal_use這種方式改資料庫字符集,其實具備很大的風險,很可能會把資料庫搞壞
[oracle@seconary ~]$ exp file=exp_t_charset_pure.dmp userid=scott  tables=t_charset


Export: Release 11.2.0.1.0 - Production on Sun Nov 29 16:38:37 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set  --可見匯出是以ZHS16GBK,這個提示與OS的環境變數NLS_LANG有關
server uses AL32UTF8 character set (possible charset conversion)


About to export specified tables via Conventional Path ...
. . exporting table                      T_CHARSET
EXP-00008: ORACLE error 6552 encountered
ORA-06552: PL/SQL: Compilation unit analysis terminated
ORA-06553: PLS-553: character set name is not recognized
Export terminated successfully with warnings.
[oracle@seconary ~]$ 


---檢視NLS_LANG
[oracle@seconary ~]$ env|grep LANG
NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
LANG=en_US.utf8


--調整NLS_LANG
[oracle@seconary ~]$ export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
[oracle@seconary ~]$ env|grep LANG
NLS_LANG=AMERICAN_AMERICA.AL32UTF8


---調整NLS_LANG中的資料庫字符集與資料庫中的字符集相同,匯出還是報錯
[oracle@seconary ~]$ exp file=exp_t_charset_pure.dmp userid=scott  tables=t_charset


Export: Release 11.2.0.1.0 - Production on Sun Nov 29 16:42:41 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in AL32UTF8 character set and AL16UTF16 NCHAR character set


About to export specified tables via Conventional Path ...
. . exporting table                      T_CHARSET
EXP-00008: ORACLE error 6552 encountered
ORA-06552: PL/SQL: Compilation unit analysis terminated
ORA-06553: PLS-553: character set name is not recognized
Export terminated successfully with warnings.


藉此學習下nls_lang中的字符集是否配置不同,會影響EXP匯出的字符集匯出的結果
可見確實會影響EXP匯出的效果,且會影響SQLPLUS的執行
[oracle@seconary ~]$ export NLS_LANG=AMERICAN_AMERICA.ZHT16DBCSFIXED
[oracle@seconary ~]$ exp file=exp_t_charset_pure_other.dmp userid=scott  tables=t_charset


BoBoBoBoBoBo: Release 11.2.0.1.0 - Production on 


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.




EXP-00092: BoBoBoBoBoBo@@BoBo@@BoBoBo@@BoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBo@@BoBo@@BoBoBoBoBoBoBoBo@@BoBoBoBoBoBoBoBoBo
EXP-00000: BoBoBoBoBoBo@@BoBoBoBoBoBoBoBoBoBo@@BoBoBoBoBoBoBoBoBoBoBoBoBoBoBo[oracle@seconary ~]$ 
[oracle@seconary ~]$ 
[oracle@seconary ~]$ 
[oracle@seconary ~]$ 
[oracle@seconary ~]$ sqlplus '/as sysdba'
Error 19 initializing SQL*Plus
Invalid NLS character set for this OS environment


---可見NLS_LANG確實會影響EXP資料庫字符集的選擇,所以一定要理解此引數的含義,否則EXP IMP以及EXPDP和IMPDP會產生資料損失以及產生亂碼
[oracle@seconary ~]$ export NLS_LANG=AMERICAN_AMERICA.UTF8
[oracle@seconary ~]$ exp file=exp_t_charset_pure_other.dmp userid=scott  tables=t_charset


Export: Release 11.2.0.1.0 - Production on Sun Nov 29 17:24:25 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in UTF8 character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)


About to export specified tables via Conventional Path ...
. . exporting table                      T_CHARSET          1 rows exported
Export terminated successfully without warnings.






來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/9240380/viewspace-1849953/,如需轉載,請註明出處,否則將追究法律責任。

相關文章