[轉載]Oracle資料庫字符集問題解析2

lastwinner發表於2005-11-11

實驗結果分析五

quote:


SQL> INSERT INTO TEST VALUES('東北');

1 row created.

SQL> EXIT
更改客戶端字符集為ZHS16GBK
D:>SET NLS_LANG=AMERICAN_AMERICA.ZHS16GBK


D:>SQLPLUS "/ AS SYSDBA"

無法顯示用US7ASCII插入的字符集,但可以顯示用ZHS16GBK插入的字符集
SQL> SELECT * FROM TEST;

R1
--------------------
東北
??
東北
6+11

SQL>
疑問3:US7ASCII為ZHS16GBK的子集,為何在US7ASCII環境下插入的資料無法顯示? [/B]
[@more@]

在客戶端字符集設定為US7ASCII時,向字符集為ZHS16GBK的資料庫中插入“東北”,需要進行字元轉換,“東北”的ZHS16GBK編碼為182(10110110)、171(10101011)與177(10110001)、177(10110001),由於US7ASCII為7bit編碼,Oracle將這兩個漢字當作四個字元,並忽略各位元組的最高位,從而存入資料庫的編碼就變成了54(00110110)、43(00101011)與49(00110001)、49(00110001),也就是“6+11”,原始資訊被改變了。這時,將客戶端字符集設定為ZHS16GBK再進行SELECT,資料庫中的資訊不需要改變傳到客戶端,第一、三行由於存入的資訊沒有改變能顯示“東北”,而第二、四行由於插入資料時資訊改變,所以不能顯示原有資訊了。

分析了這麼多的內容,但實際上總結起來也很簡單,要想在字符集方面少些錯誤與麻煩,需要堅持兩條基本原則:
在資料庫端:選擇需要的字符集(透過create database中的CHARACTER SET與NATIONAL CHARACTER SET子句指定);
在客戶端:設定作業系統實際使用的字符集(透過環境變數NLS_LANG設定)。

問題解答:
wyq21973:你是指大部分漢字在這兩個標準中編碼不一樣,還是指ZHS16GBK所包含的漢字更多些?
如果是指前者那直接更改就會產生錯誤的結果。
如果我從ZHS16GBK的庫匯出,再匯入UTF8的庫中,會出現什麼情形呢?(能不能成功,會不會出現部分漢字亂碼?)
jeffli73: 兩者編碼完全不一樣,但所能支援的漢字(專業的說法是字彙)基本相當,所以理論上如果設定得當EXP/IMP可能成功

wyq21973:下面是我剛測試的結果,匯入時字符集轉換不成功。
D:>imp userid=sys/wyq fromuser=gsm touser=gsm rows=n fil
e=c:f1.dmp,f2.dmp,f3.dmp

Import: Release 8.1.7.0.0 - Production on Tue Dec 28 09:25:12 2004

(c) Copyright 2000 Oracle Corporation. All rights reserved.


Connected to: Oracle8i Enterprise Edition Release 8.1.7.0.0 - Production
With the Partitioning option
JServer Release 8.1.7.0.0 - Production

IMP-00016: required character set conversion (type 852 to 871) not supported
IMP-00000: Import terminated unsuccessfully

sys@ORCL>Select Nls_Charset_Id(Value) Nls_Charset_Id, Value Nls_Charset_Name
2 From V$nls_Valid_Values
3 Where Parameter = 'CHARACTERSET'
4 And Nls_Charset_Id(Value) In (852, 871);

NLS_CHARSET_ID NLS_CHARSET_NAME
-------------- ----------------------------------------------------------------
852 ZHS16GBK
871 UTF8

這種情形該如何處理?

jeffli73:
有可能是Oracle 8 還不支援

看了wyq21973朋友的帖子,當時就有些疑問,因為上面我也提到從理論上講漢字由ZHS16GBK到UTF8的轉換是可能的,是不是因為wyq21973朋友用的Oracle8.1.7,還沒提供這種轉換呢,但由於當時也沒條件試,就放了下來,最近看到論壇裡關於字符集的問題又不少,於是決定做一下進一步的實驗;
我的PC上本來已經裝了一個資料庫test1,字符集為ZHS16GBK,今天利用DBCA又建了一個test2,字符集選用UTF8。
實驗所用的資料庫版本為:Oracle9i Enterprise Edition Release 9.2.0.1.0

1.確認源資料庫的字符集:
SCOTT@test1> select * from nls_database_parameters;

PARAMETER VALUE
------------------------------ ---------------------------------------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET ZHS16GBK
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM

PARAMETER VALUE
------------------------------ ---------------------------------------------------------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 9.2.0.1.0

20 rows selected.

注意:是ZHS16GBK。

2.確認轉換前的資料:
SCOTT@test1> select deptno,dname,loc,dump(loc) from dept where deptno=50;

DEPTNO DNAME LOC
---------- -------------- -------------
DUMP(LOC)
----------------------------------------------------------------------------------------------------
50 1 1中文
Typ=1 Len=5: 49,214,208,206,196


可以看到“中文”兩個字佔用4個位元組(每個漢字2個位元組),是ZHS16GBK編碼;

3.匯出源資料
D:oracleora92bin>set ORACLE_SID=test1

D:oracleora92bin>exp

Export: Release 9.2.0.1.0 - Production on Wed Mar 30 16:56:15 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


Username: scott/tiger

Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
Enter array fetch buffer size: 4096 >

Export file: EXPDAT.DMP > dept1

(2)U(sers), or (3)T(ables): (2)U > t

Export table data (yes/no): yes >

Compress extents (yes/no): yes >

Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
Table(T) or Partition(T:P) to be exported: (RETURN to quit) > dept

. . exporting table DEPT 8 rows exported
Table(T) or Partition(T:P) to be exported: (RETURN to quit) >

Export terminated successfully without warnings.

4.確認目標資料庫的字符集
SCOTT1@test2> select * from nls_database_parameters;

PARAMETER VALUE
------------------------------------------------------------ ---------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET UTF8
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM

PARAMETER VALUE
------------------------------------------------------------ ---------------------------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 9.2.0.1.0

20 rows selected.

注意:是UTF8。

5.匯入目標資料庫
D:oracleora92bin>set ORACLE_SID=test2

D:oracleora92bin>imp

Import: Release 9.2.0.1.0 - Production on Wed Mar 30 16:58:16 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Username: scott1
Password:

Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production

Import file: EXPDAT.DMP > dept1

Enter insert buffer size (minimum is 8192) 30720>

Export file created by EXPORT:V09.02.00 via conventional path

Warning: the objects were exported by SCOTT, not by you

import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
import server uses UTF8 character set (possible charset conversion)

List contents of import file only (yes/no): no >

Ignore create error due to object existence (yes/no): no >

Import grants (yes/no): yes >

Import table data (yes/no): yes >

Import entire export file (yes/no): no >
Username: scott

Enter table(T) or partition(T:P) names. Null list means all tables for user
Enter table(T) or partition(T:P) name or . if done: dept

Enter table(T) or partition(T:P) name or . if done: .

. importing SCOTT's objects into SCOTT1
. . importing table "DEPT" 8 rows imported
Import terminated successfully without warnings.

D:oracleora92bin>

注意上面的提示:
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
import server uses UTF8 character set (possible charset conversion)

6.檢查匯入資料是否正確
SCOTT1@test2> select deptno,dname,loc,dump(loc) from dept where deptno=50;

DEPTNO DNAME LOC
---------- ---------------------------- --------------------------
DUMP(LOC)
----------------------------------------------------------------------------------------------------
50 1 1中文
Typ=1 Len=7: 49,228,184,173,230,150,135


可以看到“中文”兩個字佔用6個位元組(每個漢字3個位元組),是UTF8編碼;

7.小結
在Oracle 9.2環境下,漢字可以由ZHS16GBK轉換為UTF8;
資料庫伺服器選用的字符集雖然可以各不相同,但只要各種相關設定正確,Oracle總是儘可能地將正確的轉換結果呈現給客戶端的使用者。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29867/viewspace-809870/,如需轉載,請註明出處,否則將追究法律責任。

相關文章