SQL SERVER和ORACLE的排序問題

Author:Rainny Zhong

Date: 2007-9-18

一,SQL SERVER的排序問題

SQL SERVER的排序由COLLATION指定.首先我們明白什麼是COLLATION?

COLLATION: The physical storage of character strings in Microsoft® SQL Server™ 2000 is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared.

也就是說COLLATION指定SQL SERVER字元的編碼和排序,比較規則.

1.安裝SQL SERVER時指定例項的COLLATION:

當我們在安裝SQL SERVER時,會要我們指定SQL SERVER例項的COLLATION.有兩個選項:

l WINDOWS COLLATION

l SQL COLLATION

SQL COLLATION是為了相容以前的版本而保留的.所以一般是選擇WINDOWS COLLATION,其預設值是來自於WINDOWS作系統的LOCAL SETTING:

舉例:如果安裝SQL SERVER的作業系統其語言的英語,則WINDOWS COLLATION的預設值是適用於大部分歐洲及美洲字元的拉丁字元,稱為: LATIN1_GENERAL.

在選定了WINDOWS COLLATION以後,還可以選定排序規則,分別有:

(1) 二進位制排序

(2) 字典排序,其又細分為:

l 大小寫敏感

l 口音敏感

l <日> 假名敏感

l 寬度敏感

2.新建DATABASE時指定DB的COLLATION:

在新建一個DB時,我們可以指定DB級的COLLATION,如果沒有指定,預設是繼承自例項的COLLATION.

3.新建表時指定欄位的COLLATION:

除了在新建DB時可以指定COLLATION,我們還可以在新建TABLE時,指定COLUMN的COLLATION:

4.在SQL語句中指定COLLATION

You can override the server, database, or column collation by specifying a collation in the ORDER BY clause of a SELECT statement. For more information，for example:

Select * from test order by c collate SQL_Latin1_General_CP1_CS_AS

5.排序規則的最佳化級

排序規則的優先從高到低是：

n SQL語句級

n 欄位級

n DATABASE級

n SERVER級

二,ORACLE的排序

1.Linguistic Sorting and String Searching

Oracle provides the following types of sorts:

■ Binary sort

■ Monolingual linguistic sort

■ Multilingual linguistic sort

2.Using Linguistic Sorts

n Monolingual Linguistic Sorts

n Multilingual Linguistic Sorts

Oracle provides multilingual linguistic sorts so that you can sort data in more than one language in one sort. This is useful for regions or languages that have complex sorting rules and for multilingual databases. Oracle Database 10g supports all of the sort orders defined by previous releases.

For Asian language data or multilingual data, Oracle provides a sorting mechanism based on the ISO 14651 standard and the Unicode 4.0 standard. Chinese characters are ordered by the number of strokes, PinYin, or radicals.

In addition, multilingual sorts can handle canonical equivalence and supplementary characters. Canonical equivalence is a basic equivalence between characters or sequences of characters. For example, ç is equivalent to the combination of c and ,.Supplementary characters are user-defined characters or predefined characters in Unicode 4.0 that require two code points within a specific code range. You can define up to 1.1 million code points in one multilingual sort.

For example, Oracle supports a monolingual French sort (FRENCH), but you can specify a multilingual French sort (FRENCH_M). _M represents the ISO 14651 standard for multilingual sorting. The sorting order is based on the GENERIC_M sorting order and can sort diacritical marks from right to left. Oracle Corporation recommends using

a multilingual linguistic sort if the tables contain multilingual data. If the tables contain only French, then a monolingual French sort may have better performance because it uses less memory. It uses less memory because fewer characters are defined in a monolingual French sort than in a multilingual French sort. There is a tradeoff between the scope and the performance of a sort.

3.Monolingual Linguistic Sorts

(1)monolingual linguistic sort name

Pls refer to 369 page: Table A–14 (Cont.) Monolingual Linguistic Sorts

(2).example of Monolingual Linguistic Sorts

SELECT * FROM test3 ORDER BY NLSSORT(name, 'NLS_SORT=german');

4.Multilingual Linguistic Sorts

(1).multilingual linguistic sort name

Pls refer to 370 page: Table A–15 Multilingual LInguistic Sorts

(2).example of Multilingual Linguistic Sorts

SELECT * FROM test3 ORDER BY NLSSORT(name, 'NLS_SORT= FRENCH_M');

(3).Multilingual Sorting Levels

■ Primary Level Sorts

A primary level sort distinguishes between base letters, such as the difference between characters a and b. It is up to individual locales to define whether a is beforeb, b is before a, or if they are equal. The binary representation of the characters is completely irrelevant. If a character is an ignorable character, then it is assigned a primary level order (or weight) of zero, which means it is ignored at the primary level.Characters that are ignorable on other levels are given an order of zero at those levels.

■ Secondary Level Sorts

A secondary level sort distinguishes between base letters (the primary level sort) before distinguishing between diacritics on a given base letter. For example, the character Ä differs from the character A only because it has a diacritic. Thus, Ä and A are the same on the primary level because they have the same base letter (A) but differ on the secondary level.

■ Tertiary Level Sorts

A tertiary level sort distinguishes between base letters (primary level sort), diacritics (secondary level sort), and case (upper case and lower case). It can also include special characters such as +, -, and *.

The following are examples of tertiary level sorts:

■ Characters a and A are equal on the primary and secondary levels but different on the tertiary level because they have different cases.

■ Characters ä and A are equal on the primary level and different on the secondary and tertiary levels.

■ The primary and secondary level orders for the dash character - is 0. That is, it is ignored on the primary and secondary levels. If a dash is compared with another character whose primary level order is nonzero, for example, u, then no result for

the primary level is available because u is not compared with anything. In this case, Oracle finds a difference between - and u only at the tertiary level.

5.Linguistic Sort Parameters

n NLS_SORT

設定級別：Initialization parameter, environment variable, ALTER SESSION, and SQL functions

n NLS_COMP

設定級別：Initialization parameter, environment variable, and ALTER SESSION

事實上,這兩個引數在9I和10G都有,區別在於NLS_COMP,我們先來說說這兩個引數的作用:

Ø NLS_SORT這個引數是用於設定ORDER BY子句是否按字串的二進位制值(大小寫敏感)還是按某種語言的拼音字母表排序,如英語的英文字母表(大小寫不敏感).

Ø NLS_COMP這個引數是用於設定ORACLE中有關字串比較是否大小寫敏感的,它應用於WHERE 條件的=, LIKE 和IN 和ORDER BY 子句

NLS_SORT引數的設值,9I和10G都是差不多的.區別在於NLS_COMP的取值:

(1)在9I中,NLS_COMP只可設定兩個值:

Ø BINARY:按字元的二進位制值比較

Ø ANSI:按字元的美國標準編碼值進行比較

所以,在9I中沒辦法做到在WHERE條件的=,LIKE,IN謂詞比較中大小寫不敏感,只可以做得到在ORDER BY子句中大小寫不敏感

(2)在10G中,NLS_COMP多了一個引數值:

Ø BINARY:按字元的二進位制值比較

Ø ANSI:按字元的美國標準編碼值進行比較

Ø LINGUISTIC:當設定此值後,WHERE條件的=,LIKE,IN的比較取決於NLS_SORT引數值,舉例:當我們設定NLS_COMP=LINGUISTIC,NLS_SORT=BINARY_CI,則表示WHERE條件中字串的比較是按BINARY_CI來進行比較的(大小寫不敏感),所以在10G中,做到了真正的大小寫敏感.

怎麼來設定這兩個引數?

可以透過兩種方法來設定這兩個引數:

在WINDOWS的命令列或LINUX的終端中:

Ø WINDOWS: SET NLS_SORT=, SET NLS_COMP=

Ø LINUX: export NLS_SORT=, export NLS_COMP=

但是,不管是9I還是10G,我們都只可以在會話級(SESSION)更改這兩個引數,也就是說作用域是當前會話

6.Case-Insensitive and Accent-Insensitive Linguistic Sorts

預設地，ORACLE都是大小寫繁感的，要做到大小寫不繁感，有下列方案：

n 使用upper(),lower()函式

舉例：select * from test order by upper(c);

Select * from test where upper(c)=’A’;

缺點：必須建函式索引，否則會導致排序欄位的索引不可用

n 使用NLS_UPPER()和NLS_LOWER()函式

舉例：SELECT word FROM test1

WHERE NLS_UPPER(word, 'NLS_SORT = XGERMAN') = 'GROSSE';

優點：The NLS_UPPER()and NLS_LOWER() change the case of strings based on a specific linguistic sort definition. This enables you to perform case-insensitive searches regardless of the language being used.

缺點：如果不建指定語言的索引，將不會使用欄位的索引

n 使用linguistic sorts

In Oracle Database 10g, Oracle provides case-insensitive and accent-insensitive options for linguistic sorts. Oracle provides the following types of monolingual and multilingual linguistic sorts:

■ Linguistic sorts that use information about base letters, diacritics, punctuation, and case. These are the standard monolingual and multilingual linguistic sorts that are described in "Using Linguistic Sorts" on page 5-2.

■ Linguistic sorts that use information about base letters, diacritics, and punctuation. This type of sort is called case-insensitive.

■ Linguistic sorts that use information about base letters only. This type of sort is called accent-insensitive. (Accent is another word for diacritic.) An accent-insensitive sort is always case-insensitive as well.

7.Examples of Case-Insensitive and Accent-Insensitive Sorts

(1)透過修改NLS_SORT引數實現：

SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;

SQL> ALTER SESSION SET NLS_SORT=BINARY_CI;

SELECT * FROM test2 ORDER BY letter;

(2)透過NLSSORT函式在SQL語句級實現：

SELECT * FROM test3 ORDER BY NLSSORT(name, 'NLS_SORT= BINARY_CI');

8.Performing Linguistic Comparisons

透過設定NLS_COMP 和NLS_SORT引數來實現。

(1)Linguistic Comparison Examples

Example 8–1 Binary Comparison Binary Sort

SQL> ALTER SESSION SET NLS_COMP=BINARY;

SQL> ALTER SESSION SET NLS_SORT=BINARY;

SQL> SELECT ename FROM emp1 WHERE ename LIKE 'McC%e';

Example 8–2 Linguistic Comparison Binary Case-Insensitive Sort

SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;

SQL> ALTER SESSION SET NLS_SORT=BINARY_CI;

SQL> SELECT ename FROM emp1 WHERE ename LIKE 'McC%e';

Example 8–3 Linguistic Comparison Binary Accent-Insensitive Sort

SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;

SQL> ALTER SESSION SET NLS_SORT=BINARY_AI;

SQL> SELECT ename FROM emp1 WHERE ename LIKE 'McC%e';

Example 8–4 Linguistic Comparisons Using XSPANISH

SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;

SQL> ALTER SESSION SET NLS_SORT=XSPANISH;

SQL> SELECT ename FROM emp3 WHERE ename LIKE 'C%';

9.Using Linguistic Indexes

建立基於語方的索引。

舉例：

CREATE INDEX nls_index ON my_table (NLSSORT(name, 'NLS_SORT = German'));

(1)Linguistic Indexes for Multiple Languages

■ Build a linguistic index for each language that the application supports.

CREATE INDEX french_index ON employees (NLSSORT(employee_id, 'NLS_

SORT=FRENCH'));

CREATE INDEX german_index ON employees (NLSSORT(employee_id, 'NLS_

SORT=GERMAN'));

■ Build a single linguistic index for all languages.

在表中建立一個欄位，儲存你所需要的語言:LANG_COL

This requires a language column

(LANG_COL in "Example: Setting Up a French Linguistic Index" on page 5-19) to be used as a parameter of the NLSSORT function. The language column contains NLS_LANGUAGE values for the data in the column on which the index is built. The following example builds a single linguistic index for multiple languages. With this index, the rows with the same values for NLS_LANGUAGE are sorted together:

CREATE INDEX i ON t (NLSSORT(col, 'NLS_SORT=' || LANG_COL));

Queries choose an index based on the argument of the NLSSORT function specified in the ORDER BY clause.

■ Build a single linguistic index for all languages using one of the multilingual linguistic sorts such as GENERIC_M or FRENCH_M.

CREATE INDEX i on t (NLSSORT(col,'NLS_SORT=GENERIC_M');

(2)Requirements for Using Linguistic Indexes

n 適當地設定NLS_SORT

n 如果索引欄位沒有設定NOT NULL約束，則在WHERE子句中指定NOT NULL過濾條件

舉例：WHERE NLSSORT(column_name) IS NOT NULL

[@more@]

SQL SERVER和ORACLE的排序問題

一,SQL SERVER的排序問題

1.安裝SQL SERVER時指定例項的COLLATION:

2.新建DATABASE時指定DB的COLLATION:

3.新建表時指定欄位的COLLATION:

4.在SQL語句中指定COLLATION

5.排序規則的最佳化級

二,ORACLE的排序

1.Linguistic Sorting and String Searching

2.Using Linguistic Sorts

3.Monolingual Linguistic Sorts

(1)monolingual linguistic sort name

(2).example of Monolingual Linguistic Sorts

4.Multilingual Linguistic Sorts

(1).multilingual linguistic sort name

(2).example of Multilingual Linguistic Sorts

(3).Multilingual Sorting Levels

5.Linguistic Sort Parameters

6.Case-Insensitive and Accent-Insensitive Linguistic Sorts

7.Examples of Case-Insensitive and Accent-Insensitive Sorts

(1)透過修改NLS_SORT引數實現：

(2)透過NLSSORT函式在SQL語句級實現：

8.Performing Linguistic Comparisons

(1)Linguistic Comparison Examples

9.Using Linguistic Indexes

(1)Linguistic Indexes for Multiple Languages

(2)Requirements for Using Linguistic Indexes

相關文章