mysql的字符集校對規則

shiri512003發表於2010-03-08

作者: 絃樂之花 | 可以轉載, 但必須以超連結形式標明文章原始出處和作者資訊及版權宣告

連結:http://shiri512003.itpub.net/post/37713/497267

[@more@]

看到logzgh大俠的一篇關於mysql的校對規則引起的問題http://logzgh.itpub.net/post/3185/467401.

目前mysql預設的校對規則除binary charset為binary外,都為ci,即大小寫不敏感,既然這種校驗規則容易出問題,那麼為什麼還要預設為這樣的規則呢?有句古話叫做存在即合理,不知道為什麼mysql採用了這樣的預設規則呢?

查了下oracle中的校驗規則,預設為binary,順便查了下文件關於該引數的解釋:

Note:

Setting NLS_SORT to anything other than BINARY causes a sort to use a full table scan, regardless of the path chosen by the optimizer. BINARY is the exception because indexes are built according to a binary order of keys. Thus the optimizer can use an index to satisfy the ORDER BY clause when NLS_SORT is set to BINARY. If NLS_SORT is set to any linguistic sort, the optimizer must include a full table scan and a full sort in the execution plan.

ps:題外--

前些日子還感嘆,mysql的文件結構不說怎麼樣吧,但是mysql有一個亮點就是註冊使用者可以在線上文件上新增comments,這些comments的用途想想都該知道的吧。

不經意間發現oracle 11g文件也增加了相應的模組,oracle總是在不斷吸收別人的長處不斷改進,贊一個!

回到正題,看了oracle關於該引數的說明,那不由得想到,mysql可能預設是按照linguistic語義來進行儲存的吧,而這或許也是其多數使用者需求?暫時沒有答案了。

附1:mysql中預設字符集校驗規則:

) 10:11:56>show charset;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
+----------+-----------------------------+---------------------+--------+
27 rows in set (0.00 sec)

附2:oracle nls_sort

* from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

* from v$nls_parameters where parameter ='NLS_SORT';

PARAMETER VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
NLS_SORT BINARY

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7591490/viewspace-1031660/,如需轉載,請註明出處,否則將追究法律責任。

相關文章