MySQL 8.0 Reference Manual（讀書筆記53節--Optimizing SQL Statements）

东山絮柳仔發表於2024-05-20

原文網址 : https://www.cnblogs.com/xuliuzai/p/18200914

1.Optimizing INFORMATION_SCHEMA Queries

Applications that monitor databases may make frequent【ˈfriːkwənt , friˈkwent 頻繁的;經常發生的;】 use of INFORMATION_SCHEMA tables. To write queries for these tables most efficiently, use the following general【ˈdʒenrəl 一般的;總的;普遍的;大致的，大概的（方向或地區）;常規的;大體的;整體的;正常的;全體的;首席的;籠統的;概括性的;非專門的;】 guidelines:

• Try to query only INFORMATION_SCHEMA tables that are views on data dictionary tables.

• Try to query only for static metadata. Selecting columns or using retrieval conditions for dynamic metadata along with static metadata adds overhead to process the dynamic metadata.

These INFORMATION_SCHEMA tables are implemented【ˈɪmplɪmentɪd 實施;執行;貫徹;使生效;】 as views on data dictionary tables, so queries on them retrieve information from the data dictionary:

CHARACTER_SETS
CHECK_CONSTRAINTS
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
EVENTS
FILES
INNODB_COLUMNS
INNODB_DATAFILES
INNODB_FIELDS
INNODB_FOREIGN
INNODB_FOREIGN_COLS
INNODB_INDEXES
INNODB_TABLES
INNODB_TABLESPACES
INNODB_TABLESPACES_BRIEF
INNODB_TABLESTATS
KEY_COLUMN_USAGE
PARAMETERS
PARTITIONS
REFERENTIAL_CONSTRAINTS
RESOURCE_GROUPS
ROUTINES
SCHEMATA
STATISTICS
TABLES
TABLE_CONSTRAINTS
TRIGGERS
VIEWS
VIEW_ROUTINE_USAGE
VIEW_TABLE_USAGE

Some types of values, even for a non-view INFORMATION_SCHEMA table, are retrieved by lookups from the data dictionary. This includes values such as database and table names, table types, and storage engines.

Some INFORMATION_SCHEMA tables contain columns that provide table statistics:

STATISTICS.CARDINALITY
TABLES.AUTO_INCREMENT
TABLES.AVG_ROW_LENGTH
TABLES.CHECKSUM
TABLES.CHECK_TIME
TABLES.CREATE_TIME
TABLES.DATA_FREE
TABLES.DATA_LENGTH
TABLES.INDEX_LENGTH
TABLES.MAX_DATA_LENGTH
TABLES.TABLE_ROWS
TABLES.UPDATE_TIME

Those columns represent【ˌreprɪˈzent 代表，表示;（在競賽或體育賽事中）代表（國家或地區）;（標誌或符號）意味著，代表著，標誌著;描繪，（形象地）表現，描寫;代理（個人或團體）;代表（個人或團體）出席;使】 dynamic table metadata; that is, information that changes as table contents change.

By default, MySQL retrieves cached values for those columns from the mysql.index_stats and mysql.table_stats dictionary tables when the columns are queried, which is more efficient than retrieving statistics directly from the storage engine. If cached statistics are not available or have expired, MySQL retrieves the latest statistics from the storage engine and caches them in the mysql.index_stats and mysql.table_stats dictionary tables. Subsequent【ˈsʌbsɪkwənt 隨後的;之後的;後來的;接後的;】 queries retrieve the cached statistics until the cached statistics expire. A server restart or the first opening of the mysql.index_stats and mysql.table_stats tables do not update cached statistics automatically.

The information_schema_stats_expiry session variable defines the period of time before cached statistics expire. The default is 86400 seconds (24 hours), but the time period can be extended to as much as one year.

To update cached values at any time for a given table, use ANALYZE TABLE.

Querying statistics columns does not store or update statistics in the mysql.index_stats and mysql.table_stats dictionary tables under these circumstances【ˈsɜrkəmˌstænsəz , ˈsɜrkəmˌstænsɪz 環境;條件;狀況;境況;境遇;(尤指)經濟狀況;命運;客觀環境;】:

• When cached statistics have not expired.

• When information_schema_stats_expiry is set to 0.

• When the server is in read_only, super_read_only, transaction_read_only, or innodb_read_only mode.

• When the query also fetches Performance Schema data.

information_schema_stats_expiry is a session variable, and each client session can define its own expiration value. Statistics that are retrieved from the storage engine and cached by one session are available to other sessions.

【If the innodb_read_only system variable is enabled, ANALYZE TABLE may fail because it cannot update statistics tables in the data dictionary, which use InnoDB. For ANALYZE TABLE operations that update the key distribution, failure may occur even if the operation updates the table itself (for example, if it is a MyISAM table). To obtain the updated distribution statistics, set information_schema_stats_expiry=0.】

For INFORMATION_SCHEMA tables implemented as views on data dictionary tables, indexes on the underlying data dictionary tables permit the optimizer to construct efficient query execution plans. To see the choices made by the optimizer, use EXPLAIN. To also see the query used by the server to execute an INFORMATION_SCHEMA query, use SHOW WARNINGS immediately following EXPLAIN.

Consider this statement, which identifies【aɪˈdentɪfaɪz 確認;發現;鑑定;顯示;找到;認出;說明身份;】 collations for the utf8mb4 character set:

mysql> SELECT COLLATION_NAME
 FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
 WHERE CHARACTER_SET_NAME = 'utf8mb4';
+----------------------------+
| COLLATION_NAME             |
+----------------------------+
| utf8mb4_general_ci         |
| utf8mb4_bin                |
| utf8mb4_unicode_ci         |
| utf8mb4_icelandic_ci       |
| utf8mb4_latvian_ci         |
| utf8mb4_romanian_ci        |
| utf8mb4_slovenian_ci       |
...

How does the server process that statement? To find out, use EXPLAIN:

mysql> EXPLAIN SELECT COLLATION_NAME
 FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
 WHERE CHARACTER_SET_NAME = 'utf8mb4'\G
*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: cs
 partitions: NULL
 type: const
possible_keys: PRIMARY,name
 key: name
 key_len: 194
 ref: const
 rows: 1
 filtered: 100.00
 Extra: Using index
*************************** 2. row ***************************
 id: 1
 select_type: SIMPLE
 table: col
 partitions: NULL
 type: ref
possible_keys: character_set_id
 key: character_set_id
 key_len: 8
 ref: const
 rows: 68
 filtered: 100.00
 Extra: NULL
2 rows in set, 1 warning (0.01 sec)

To see the query used to satisfy that statement, use SHOW WARNINGS:

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
 Level: Note
 Code: 1003
Message: /* select#1 */ select `mysql`.`col`.`name` AS `COLLATION_NAME`
 from `mysql`.`character_sets` `cs`
 join `mysql`.`collations` `col`
 where ((`mysql`.`col`.`character_set_id` = '45')
 and ('utf8mb4' = 'utf8mb4'))

As indicated by SHOW WARNINGS, the server handles the query on COLLATION_CHARACTER_SET_APPLICABILITY as a query on the character_sets and collations data dictionary tables in the mysql system database.

2. Optimizing Performance Schema Queries

Applications that monitor databases may make frequent use of Performance Schema tables. To write queries for these tables most efficiently, take advantage of their indexes. For example, include a WHERE clause that restricts retrieved rows based on comparison to specific values in an indexed column.

Most Performance Schema tables have indexes. Tables that do not are those that normally contain few rows or are unlikely to be queried frequently. Performance Schema indexes give the optimizer access to execution plans other than full table scans. These indexes also improve performance for related objects, such as sys schema views that use those tables.

To see whether a given Performance Schema table has indexes and what they are, use SHOW INDEX or SHOW CREATE TABLE:

mysql> SHOW INDEX FROM performance_schema.accounts\G
*************************** 1. row ***************************
 Table: accounts
 Non_unique: 0
 Key_name: ACCOUNT
 Seq_in_index: 1
 Column_name: USER
 Collation: NULL
 Cardinality: NULL
 Sub_part: NULL
 Packed: NULL
 Null: YES
 Index_type: HASH
 Comment:
Index_comment:
 Visible: YES
*************************** 2. row ***************************
 Table: accounts
 Non_unique: 0
 Key_name: ACCOUNT
 Seq_in_index: 2
 Column_name: HOST
 Collation: NULL
 Cardinality: NULL
 Sub_part: NULL
 Packed: NULL
 Null: YES
 Index_type: HASH
 Comment:
Index_comment:
 Visible: YES
mysql> SHOW CREATE TABLE performance_schema.rwlock_instances\G
*************************** 1. row ***************************
 Table: rwlock_instances
Create Table: CREATE TABLE `rwlock_instances` (
 `NAME` varchar(128) NOT NULL,
 `OBJECT_INSTANCE_BEGIN` bigint(20) unsigned NOT NULL,
 `WRITE_LOCKED_BY_THREAD_ID` bigint(20) unsigned DEFAULT NULL,
 `READ_LOCKED_BY_COUNT` int(10) unsigned NOT NULL,
 PRIMARY KEY (`OBJECT_INSTANCE_BEGIN`),
 KEY `NAME` (`NAME`),
 KEY `WRITE_LOCKED_BY_THREAD_ID` (`WRITE_LOCKED_BY_THREAD_ID`)
) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

To see the execution plan for a Performance Schema query and whether it uses any indexes, use EXPLAIN:

mysql> EXPLAIN SELECT * FROM performance_schema.accounts
 WHERE (USER,HOST) = ('root','localhost')\G
*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: accounts
 partitions: NULL
 type: const
possible_keys: ACCOUNT
 key: ACCOUNT
 key_len: 278
 ref: const,const
 rows: 1
 filtered: 100.00
 Extra: NULL

The EXPLAIN output indicates that the optimizer uses the accounts table ACCOUNT index that comprises the USER and HOST columns.

Performance Schema indexes are virtual: They are a construct of the Performance Schema storage engine and use no memory or disk storage. The Performance Schema reports index information to the optimizer so that it can construct efficient execution plans. The Performance Schema in turn uses optimizer information about what to look for (for example, a particular key value), so that it can perform efficient lookups without building actual index structures. This implementation provides two important benefits:

• It entirely avoids the maintenance cost normally incurred for tables that undergo frequent updates.

• It reduces at an early stage of query execution the amount of data retrieved. For conditions on the indexed columns, the Performance Schema efficiently returns only table rows that satisfy the query conditions. Without an index, the Performance Schema would return all rows in the table, requiring that the optimizer later evaluate the conditions against each row to produce the final result.

Performance Schema indexes are predefined and cannot be dropped, added, or altered.

Performance Schema indexes are similar to hash indexes. For example:

• They are used only for equality comparisons that use the = or <=> operators.

• They are unordered. If a query result must have specific row ordering characteristics, include an ORDER BY clause.

3.Optimizing Data Change Statements

This section explains how to speed up data change statements: INSERT, UPDATE, and DELETE. Traditional OLTP applications and modern web applications typically do many small data change operations, where concurrency is vital【ˈvaɪtl 至關重要的，必不可少的;生命的;生命統計的，生死統計的;生氣勃勃的，充滿生機的;致命的，生死攸關的;對…極重要的;維持生命所必需的;熱情洋溢的;】. Data analysis and reporting applications typically run data change operations that affect many rows at once, where the main considerations is the I/O to write large amounts of data and keep indexes up-to-date. For inserting and updating large volumes of data (known in the industry as ETL, for “extract-transform-load”), sometimes you use other SQL statements or external commands, that mimic the effects of INSERT, UPDATE, and DELETE statements.

3.1 Optimizing INSERT Statements

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:

• Connecting: (3)

• Sending query to server: (2)

• Parsing query: (2)

• Inserting row: (1 × size of row)

• Inserting indexes: (1 × number of indexes)

• Closing: (1)

This does not take into consideration the initial【ɪˈnɪʃl 始的;最初的;第一的;】 overhead to open tables, which is done once for each concurrently running query.

The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

You can use the following methods to speed up inserts:

• If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

• When loading a table from a text file, use LOAD DATA. This is usually 20 times faster than using INSERT statements.

• Take advantage of the fact that columns have default values. Insert values explicitly【ɪkˈsplɪsətli 明確地;明白地;】 only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.

3.2 Optimizing UPDATE Statements

An update statement is optimized like a SELECT query with the additional overhead of a write. The speed of the write depends on the amount of data being updated and the number of indexes that are updated. Indexes that are not changed do not get updated.

Another way to get fast updates is to delay updates and then do many updates in a row later. Performing multiple updates together is much quicker than doing one at a time if you lock the table.

For a MyISAM table that uses dynamic row format, updating a row to a longer total length may split the row. If you do this often, it is very important to use OPTIMIZE TABLE occasionally【əˈkeɪʒnəli 偶爾;偶然;有時候;偶爾地;】.

3.3 Optimizing DELETE Statements

The time required to delete individual rows in a MyISAM table is exactly proportional【prəˈpɔːrʃənl 成比例的;相稱的;均衡的;】 to the number of indexes. To delete rows more quickly, you can increase the size of the key cache by increasing the key_buffer_size system variable.

To delete all rows from a MyISAM table, TRUNCATE TABLE tbl_name is faster than DELETE FROM tbl_name. Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock.

4.Optimizing Database Privileges

The more complex your privilege setup, the more overhead【ˌoʊvərˈhed , ˈoʊvərhed 高架的;管理費用的;經費的;頭上方的;地面以上的;】 applies to all SQL statements. Simplifying the privileges established by GRANT statements enables MySQL to reduce permission-checking overhead【ˌoʊvərˈhed , ˈoʊvərhed 開銷;經常費用;經常開支;（尤指飛機的）頂艙;用於高射投影器的幻燈片;】 when clients execute statements. For example, if you do not grant any table-level or column-level privileges, the server need not ever check the contents of the tables_priv and columns_priv tables. Similarly, if you place no resource limits on any accounts, the server does not have to perform resource counting. If you have a very high statement-processing load, consider using a simplified grant structure【ˈstrʌktʃər 結構;構造;體系;(尤指)建築物;結構體;精心組織;周密安排;】 to reduce permission-checking overhead.

5.Other Optimization Tips

This section lists a number of miscellaneous【ˌmɪsəˈleɪniəs 各種各樣的;混雜的;】 tips for improving query processing speed:

• If your application makes several database requests to perform related updates, combining the statements into a stored routine can help performance. Similarly, if your application computes a single result based on several column values or large volumes of data, combining the computation into a loadable function can help performance. The resulting fast database operations are then available to be reused by other queries, applications, and even code written in different programming languages.

• To fix any compression issues that occur with ARCHIVE tables, use OPTIMIZE TABLE.

• If possible, classify【ˈklæsɪfaɪ 分類;劃分;將…分類;界定;】 reports as “live” or as “statistical”, where data needed for statistical reports is created only from summary tables that are generated periodically from the live data.

• If you have data that does not conform well to a rows-and-columns table structure, you can pack【pæk 包裝，裝（箱）;（把…）打包，（把…）裝箱;收拾（行李）;堆積，壓實;擠滿，塞滿;備有，含有;（用某物）儲存，保藏;佩帶，攜帶（槍支）;停止，結束;加工包裝（食品），把】 and store data into a BLOB column. In this case, you must provide code in your application to pack and unpack information, but this might save I/O operations to read and write the sets of related values.

• With Web servers, store images and other binary assets as files, with the path name stored in the database rather than the file itself. Most Web servers are better at caching files than database contents, so using files is generally faster. (Although you must handle backups and storage issues yourself in this case.)

• If you need really high speed, look at the low-level MySQL interfaces. For example, by accessing the MySQL InnoDB or MyISAM storage engine directly, you could get a substantial speed increase compared to using the SQL interface.

Similarly, for databases using the NDBCLUSTER storage engine, you may wish to investigate possible use of the NDB API.

• Replication can provide a performance benefit for some operations. You can distribute【dɪˈstrɪbjuːt 分配;使分佈;分發;分銷;分散;使散開;】 client retrievals among replicas to split up【splɪt ʌp （使）分手;分開;（使）離婚;拆分;（使）離散;（把…）分成若干較小部分;】 the load. To avoid slowing down the source while making backups, you can make backups using a replica.