【字符集】“客戶終端字符集”、“NLS_LANG”環境變數以及“資料庫字符集”

secooler發表於2009-09-22
自從選用了AL32UTF8字符集做為生產資料庫字符集之後,就一直奔走於“亂碼”與“轉碼”之間。

在與“亂碼”PK良久之後,有了這個小文兒。

如果想要搞清楚Oracle的字元系統,需要緊緊地抓住三個因素:
一.“客戶終端字符集”
二.“NLS_LANG”環境變數
三.“資料庫字符集”

如果“NLS_LANG”等於“資料庫字符集”時,不需要進行任何轉換,直接把字元插入資料庫
如果“NLS_LANG”不等於“資料庫字符集”,則需要進行轉換,亂碼的根源就在這裡


1.“資料庫字符集”是AL32UTF8,具體資訊如下:
sys@ora10g> col VALUE for a30
sys@ora10g> select * from nls_database_parameters;

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               AL32UTF8
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE
NLS_NCHAR_CHARACTERSET         UTF8
NLS_RDBMS_VERSION              10.2.0.3.0

20 rows selected.

2.“客戶終端字符集”資訊如下:
以下實驗使用了兩種主要客戶端(不包括後面提到的PL/SQL Developer和Toad):一個是XP的cmd命令列工具,另一個是PuTTY工具。
1)XP字符集是
C:\>chcp
Active code page: 936
內碼表936就是中文字符集GBK,可以參考msdn的資料《Windows Codepage 936》
http://www.microsoft.com/globaldev/reference/dbcs/936.htm

2)PuTTY字符集我的設定:utf8

3.客戶端使用AL32UTF8字符集進行測試
C:\>set NLS_LANG=AMERICAN_AMERICA.AL32UTF8

C:\>sqlplus sec/sec@DB_AL32UTF8

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Sep 22 16:12:14 2009

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options

sec@ora10g> drop table t purge;

Table dropped.

sec@ora10g> create table t (x varchar2(20), y varchar2(20));

Table created.

sec@ora10g> insert into t values ('聖','AL32UTF8');

1 row created.

sec@ora10g> commit;

Commit complete.

sec@ora10g> col x for a10
sec@ora10g> col dump(x) for a30
sec@ora10g> select x, y, dump(x) from t;

X          Y                    DUMP(X)
---------- -------------------- ------------------------------
聖          AL32UTF8             Typ=1 Len=2: 202,165

4.客戶端使用WE8ISO8859P1字符集進行測試
C:\>set NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

C:\>sqlplus sec/sec@
DB_AL32UTF8

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Sep 22 16:14:55 2009

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options

sec@ora10g> insert into t values ('聖','WE8ISO8859P1');

1 row created.

sec@ora10g> commit;

Commit complete.

sec@ora10g>
sec@ora10g> col x for a10
sec@ora10g> col dump(x) for a30
sec@ora10g> select x, y, dump(x) from t;

X          Y                    DUMP(X)
---------- -------------------- ------------------------------
           AL32UTF8             Typ=1 Len=2: 202,165
聖         WE8ISO8859P1         Typ=1 Len=4: 195,138,194,165

5.客戶端使用ZHS16GBK字符集進行測試
C:\>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

C:\>sqlplus sec/sec@
DB_AL32UTF8

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Sep 22 16:17:13 2009

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options

sec@ora10g> insert into t values ('聖','ZHS16GBK');

1 row created.

sec@ora10g> commit;

Commit complete.

sec@ora10g> col x for a10
sec@ora10g> col dump(x) for a30
sec@ora10g> select x, y, dump(x) from t;

X          Y                    DUMP(X)
---------- -------------------- ------------------------------
?         AL32UTF8             Typ=1 Len=2: 202,165
ꥠ       WE8ISO8859P1         Typ=1 Len=4: 195,138,194,165
聖         ZHS16GBK             Typ=1 Len=3: 229,156,163

6.使用PuTTY以ssh方式連線資料庫伺服器進行測試
ora10g@secDB /home/oracle$ sqlplus sec/sec

SQL*Plus: Release 10.2.0.3.0 - Production on Tue Sep 22 16:21:17 2009

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options

sec@ora10g> insert into t values ('聖','PuTTY AL32UTF8');

1 row created.

sec@ora10g> commit;

Commit complete.

sec@ora10g> col x for a10
sec@ora10g> col dump(x) for a30
sec@ora10g> select x, y, dump(x) from t;

X          Y                    DUMP(X)
---------- -------------------- ------------------------------
?          AL32UTF8             Typ=1 Len=2: 202,165
ꥠ        WE8ISO8859P1         Typ=1 Len=4: 195,138,194,165
聖         ZHS16GBK             Typ=1 Len=3: 229,156,163
聖         PuTTY AL32UTF8       Typ=1 Len=3: 229,156,163

7.最後我們將NLS_LANG置空進行一下最後的嘗試
C:\>set NLS_LANG=

C:\>sqlplus sec/sec@
DB_AL32UTF8

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Sep 22 16:24:57 2009

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options

sec@ora10g>  insert into t values ('聖','unset NLS_LANG');

1 row created.

sec@ora10g> commit;

Commit complete.

sec@ora10g> col x for a10
sec@ora10g> col dump(x) for a30
sec@ora10g> select x, y, dump(x) from t;

X          Y                    DUMP(X)
---------- -------------------- ------------------------------
聖         AL32UTF8             Typ=1 Len=2: 202,165
脢樓       WE8ISO8859P1         Typ=1 Len=4: 195,138,194,165
鍦?        ZHS16GBK             Typ=1 Len=3: 229,156,163
鍦?        PuTTY AL32UTF8       Typ=1 Len=3: 229,156,163
聖         unset NLS_LANG       Typ=1 Len=2: 202,165

8.Toad中“Execute as script”的執行效果:
X          Y                    DUMP(X)
---------- -------------------- ------------------------------
?          AL32UTF8             Typ=1 Len=2: 202,165
ꥠ       WE8ISO8859P1         Typ=1 Len=4: 195,138,194,165
聖         ZHS16GBK             Typ=1 Len=3: 229,156,163
聖         PuTTY AL32UTF8       Typ=1 Len=3: 229,156,163
?          unset NLS_LANG       Typ=1 Len=2: 202,165

5 rows selected.

9.Toad中“F9”的執行效果
X          Y                    DUMP(X)
---------- -------------------- ------------------------------
聖         AL32UTF8             Typ=1 Len=2: 202,165
脢樓         WE8ISO8859P1         Typ=1 Len=4: 195,138,194,165
鍦         ZHS16GBK             Typ=1 Len=3: 229,156,163
鍦         PuTTY AL32UTF8       Typ=1 Len=3: 229,156,163
聖         unset NLS_LANG       Typ=1 Len=2: 202,165

10.PL/SQL Developer中“Command Window”執行效果
X          Y                    DUMP(X)
---------- -------------------- ------------------------------
?          AL32UTF8             Typ=1 Len=2: 202,165
ꥠ        WE8ISO8859P1         Typ=1 Len=4: 195,138,194,165
聖         ZHS16GBK             Typ=1 Len=3: 229,156,163
聖         PuTTY AL32UTF8       Typ=1 Len=3: 229,156,163
?          unset NLS_LANG       Typ=1 Len=2: 202,165

11.PL/SQL Developer中“F8”執行效果
X          Y                    DUMP(X)
---------- -------------------- ------------------------------
?          AL32UTF8             Typ=1 Len=2: 202,165
ꥠ       WE8ISO8859P1         Typ=1 Len=4: 195,138,194,165
聖         ZHS16GBK             Typ=1 Len=3: 229,156,163
聖         PuTTY AL32UTF8       Typ=1 Len=3: 229,156,163
?          unset NLS_LANG       Typ=1 Len=2: 202,165


12.實驗結論
1)如果有可能,儘量保證客戶端編碼(Windows XP的cmd工具可以使用chcp命令來確認)、NLS_LANG引數和資料庫字符集這三個內容一致,這樣設定,無論是從效能上,還是從防止編碼轉換上都是最佳的;
2)如果目的是支援中文,資料庫Server端的字符集應該儘量選擇ZHS16GBK或AL32UTF8字符集,這樣可以減少因不當的“轉碼”導致的字元亂碼故障;
3)(推薦)可已將NLS_LANG引數與操作終端字元編碼一致,這樣可以保證資料庫能正確獲得應用終端使用的編碼,這時會發生“編碼轉換”,但是,這樣就可以保證正確轉碼,可以防止錯誤的編碼存入資料庫;
4)(不推薦)也可以將NLS_LANG引數與資料庫伺服器端的編碼一致,這樣,客戶端無論是傳送到伺服器端還是從伺服器接收資料都不會“轉碼”,這樣能保證客戶端對字元的顯示效果,但是,一定要小心,這時資料庫伺服器上存放的字元編碼很可能是錯誤的。
5)PL/SQL Developer工具在AL32UTF8字符集下貌似可以保證資料效果,但是“Toad同學”貌似不太“穩定”。

secooler
09.09.22

-- The End --

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/519536/viewspace-615345/,如需轉載,請註明出處,否則將追究法律責任。

相關文章