關於mysql字符集及排序規則設定

darren__chan發表於2020-09-03
How Can I Change the Default Character Set and Collation for a Database in MySQL? (Doc ID 1023320.1)


To determine the default character set and collation for a database, enter this SQL statement:

查詢:
Beginning with MySQL 5.0, you can also obtain this information from the INFORMATION_SCHEMA using a query such as this:
SELECT SCHEMA_NAME,
DEFAULT_CHARACTER_SET_NAME,
DEFAULT_COLLATION_NAME
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE SCHEMA_NAME='database1';
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;


utf8mb4設定:
[client]
default-character-set=utf8mb4
[mysqld]
character-set-server = utf8mb4 ### 伺服器字符集
collation-server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'  設定所有字符集為utf8mb4,即:client、connection、results為utf8(o´ω`o)ノ
skip-character-set-client-handshake = true
[mysql]
default-character-set = utf8mb4



關於排序規則,參考:

What Is The Difference Between utf8mb4_unicode_ci And utf8mb4_general_ci collations? Choose One For Asian Languages With Particular Characters (Doc ID 2230559.1)


* What is the difference between utf8mb4_unicode_ci and utf8mb4_general_ci collations?

These two collations are both for the utf8mb4 character encoding. The differences are in how text is sorted and compared:
1) Accuracy
utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages.
utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters.
2) Performance
utf8mb4_general_ci is faster at comparisons and sorting, because it takes a bunch of performance-related shortcuts.
utf8mb4_unicode_ci, which uses the Unicode rules for sorting and comparison, employs a fairly complex algorithm for correct sorting in a wide range of languages and when using a wide range of special characters. These rules need to take into account language-specific conventions; not everybody sorts their characters in what we would call 'alphabetical order'.
On modern servers, this performance consumption will be almost negligible.
* Which one should we use if we store lots of Asian Languages characters?
You should use utf8mb4_unicode_ci.
The reason is:
utf8mb4_unicode_ci provides much more accurate for Asian languages (e.g. Chinese) than utf8mb4_general_ci does. Using utf8mb4_general_ci for Asian languages with particular characters could be inadequate.
Be advised that when migrating from general_ci to unicode_ci, there are some differences in results expected.
General_ci is a simplified set of sorting rules which aims to do as well as it can while taking many short-cuts designed to improve speed. It does not follow the Unicode rules and will result in undesirable sorting or comparison in some situations, such as when using particular languages or characters.
One example is that some characters from Asian languages modifiers like accents can be treated differently in General_ci and Unicode_ci, so when filtering results you should expect different results.
For this specific example and if you want to maintain similar behaviour from general_ci you should consider using utf8mb4_0900_as_ci in MySQL 8.0


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29863023/viewspace-2716846/,如需轉載,請註明出處,否則將追究法律責任。

相關文章