【ORDER BY】在ZHS16GBK字符集下Oracle漢字排序依據亦是二進位制編碼非拼音

secooler發表於2010-09-12
曾經討論並驗證過這個問題,記錄在此,供大家參考。
在ZHS16GBK字符集下Oracle漢字排序依據亦是二進位制編碼非拼音!

1.確認系統的版本和字符集資訊
1)資料庫版本為Oracle 10.2.0.3
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for 32-bit Windows: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production

2)確認系統的字符集
SQL> col name for a30
SQL> col value$ for a30
SQL> col comment$ for a40
SQL> set lines 131
SQL> set pages 200
SQL> select * from sys.props$;

NAME                           VALUE$                         COMMENT$
------------------------------ ------------------------------ ----------------------------------------
DICT.BASE                      2                              dictionary base tables version #
DEFAULT_TEMP_TABLESPACE        TEMP                           Name of default temporary tablespace
DEFAULT_PERMANENT_TABLESPACE   USERS                          Name of default permanent tablespace
DEFAULT_TBS_TYPE               SMALLFILE                      Default tablespace type
NLS_LANGUAGE                   AMERICAN                       Language
NLS_TERRITORY                  AMERICA                        Territory
NLS_CURRENCY                   $                              Local currency
NLS_ISO_CURRENCY               AMERICA                        ISO currency
NLS_NUMERIC_CHARACTERS         .,                             Numeric characters
NLS_CHARACTERSET               ZHS16GBK                       Character set
NLS_CALENDAR                   GREGORIAN                      Calendar system
NLS_DATE_FORMAT                DD-MON-RR                      Date format
NLS_DATE_LANGUAGE              AMERICAN                       Date language
NLS_SORT                       BINARY                         Linguistic definition
NLS_TIME_FORMAT                HH.MI.SSXFF AM                 Time format
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM       Time stamp format
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR             Time with timezone format
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR   Timestamp with timezone format
NLS_DUAL_CURRENCY              $                              Dual currency symbol
NLS_COMP                       BINARY                         NLS comparison
NLS_LENGTH_SEMANTICS           BYTE                           NLS length semantics
NLS_NCHAR_CONV_EXCP            FALSE                          NLS conversion exception
NLS_NCHAR_CHARACTERSET         AL16UTF16                      NCHAR Character set
NLS_RDBMS_VERSION              10.2.0.3.0                     RDBMS version for NLS parameters
GLOBAL_DB_NAME                 ORCL.REGRESS.RDBMS.DEV.US.ORAC Global database name
                               LE.COM

EXPORT_VIEWS_VERSION           8                              Export views revision #
DBTIMEZONE                     00:00                          DB time zone

已選擇27行。


可見資料庫字符集為ZHS16GBK,其中重要的資訊是“NLS_SORT BINARY”,就是這條資訊指明瞭資料庫的排序方式是按照二進位制編碼來完成的。

2.驗證這個結論
1)建立測試表T並初始化四條記錄
SQL> drop table t purge;
SQL> create table t (x varchar2(10), y varchar2(10), z number);
SQL> insert into t values ('甲','Jia',ascii('甲'));
SQL> insert into t values ('骨','Gu',ascii('骨'));
SQL> insert into t values ('文','Wen',ascii('文'));
SQL> insert into t values ('睿','Rui',ascii('睿'));
SQL> commit;
SQL> select * from t;

X          Y                   Z
---------- ---------- ----------
甲         Jia             48343
骨         Gu              47559
文         Wen             52932
睿         Rui             61091

2)按照x列的內容進行排序
SQL> select * from t order by x;

X          Y                   Z
---------- ---------- ----------
骨         Gu              47559
甲         Jia             48343
文         Wen             52932
睿         Rui             61091

結果很有說明性,這個結果是根據漢字的二進位制ascii碼完成的排序。此時漢字“睿”沒有按照拼音的順序進行排序。

3)使用nlssort強制漢字按照拼音順序排序
SQL> select * from t order by nlssort(x,'NLS_SORT=SCHINESE_PINYIN_M');

X          Y                   Z
---------- ---------- ----------
骨         Gu              47559
甲         Jia             48343
睿         Rui             61091
文         Wen             52932

此時漢字即按照拼音的順序完成了排序。
有關nlssort的用法請參考文章《【NLSSORT】改變Oralce 對簡體漢字的排序規則(拼音、部首、筆畫)》(http://space.itpub.net/519536/viewspace-627797)。

3.小結
希望透過本文澄清有關Oracle的漢字排序規則。不可想當然的認為在中文字符集下的漢字排序規則是按照拼音順序進行排序。

Good luck.

secooler
10.09.12

-- The End --

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/519536/viewspace-673816/,如需轉載,請註明出處,否則將追究法律責任。

相關文章