What is the difference between Mysql InnoDB B+ tree index and hash index? Why does MongoDB use B-tree?

透明飞起来了發表於2024-04-05

原文網址 : https://www.cnblogs.com/lihan829/p/18116940

原文：What is the difference between Mysql InnoDB B+ tree index and hash index? Why does MongoDB use B-tree? | by Mina Ayoub | Medium

The most important difference between B-tree and B+ tree is that B+ tree only has leaf nodes to store data, and other nodes are used for indexing, while B-trees have Data fields for each index node.

B+ tree

The B+ tree is a balanced lookup tree (not a binary tree) designed for disks and other storage aids. In the B+ tree, all recorded nodes are stored in the leaf nodes of the same layer in order of size, and each leaf node is connected by a pointer.

The B+ tree index in the database is divided into a clustered index and a secondary index. The commonality of the two indexes is that the internal B+ tree is balanced, and the leaf nodes store all the data. The difference is whether the leaf node stores an entire row of data.

The B+ tree has the following characteristics :

The B+ tree can contain more nodes per node for two reasons, one is to reduce the height of the tree. The other is to change the data range into multiple intervals. The more the interval, the faster the data retrieval.
Each node no longer just stores a key, it can store multiple keys.
Non-leaf nodes store keys, and leaf nodes store keys and data.
The leaf nodes are linked to each other by two or two pointers, and the sequential query performance is higher.

Popular speaking

The non-leaf nodes of the B+ tree only store keys, occupying a very small space, so the data range that each layer of nodes can index is much wider. In other words, more data can be searched for each IO operation.
The leaf nodes are connected in pairs, which conforms to the read-ahead characteristics of the disk. For example, the leaf node stores 50 and 55, which has a pointer to the leaf nodes 60 and 62. When we read the data corresponding to 50 and 55 from the disk, due to the read-ahead characteristics of the disk, we will put 60 and 62 by the way. The corresponding data is read out. This time is a sequential read, not a disk seek, speeding up.
Support range query, and partial range query is very efficient, each node can index a larger and more accurate range, which means that the B+ tree single disk IO information is larger than the B-tree, and the I/O efficiency is higher.

The reason is that the data is stored in the leaf node layer, and there are pointers to other leaf nodes, so the range query only needs to traverse the leaf node layer, without the whole tree traversal.

Local principle and disk read-ahead

Due to the gap between disk access speed and memory, in order to improve efficiency, disk I/O should be minimized. Disks are often not read strictly on demand, but are read-ahead each time. After the disk reads the required data, it will Read a certain length of data backwards in memory. The theoretical basis for doing so is the well-known local principle in computer science:

When a piece of data is used, the data in its vicinity is usually used immediately, and the data required during the running of the program is usually concentrated.

B-tree

B-tree, where B is balance (balanced meaning), B-tree is a multi-path self-balancing search tree. It is similar to a normal balanced binary tree. The difference is that B-tree allows each node to have more Child node.

B-tree has the following characteristics

All key values are distributed throughout the tree.
Any keyword appears and only appears in one node.
The search may end at a non-leaf node.
Do a lookup in the full set of keywords, performance approaching binary search.

The difference between B-tree and B+ tree

The nodes in the B+ tree do not store data, and all data stored in the leaf nodes causes the query time complexity to be fixed to log n.
The B-tree query time complexity is not fixed, and is related to the position of the key in the tree, preferably O(1).
The B+ leaf nodes are connected in pairs, which can greatly increase the interval accessibility, and can be used in range query.
B-tree Each node key and data together, can not find the interval.
The B+ tree is more suitable for external storage (storing disk data). Since the inner nodes have no data fields, each node can index a larger and more precise range.

Why does MongoDB use B-tree?

The nodes in the B+ tree do not store data, and all data stored in the leaf nodes causes the query time complexity to be fixed to log n. The B-tree query time complexity is not fixed, and it is related to the position of the key in the tree, preferably O(1).

We have said that as little disk IO as possible is an effective way to improve performance. MongoDB is a converged database, and the B-tree happens to be a cluster of key and data domains .

As for why MongoDB uses B-tree instead of B+ tree, it can be considered from the perspective of its design. It is not a traditional relational database, but a JSON format as a stored nosql. The purpose is high performance, high availability, and easy expansion. . First of all, it gets rid of the relational model. The advantages and 2 requirements described above are not so strong. Secondly, because Mysql uses B+ tree, the data is on the leaf node. Every query needs to access the leaf node, and MongoDB uses B-tree. All nodes have a Data field. As long as the specified index is found, it can be accessed. Undoubtedly, the average query is faster than Mysql .

Hash index

Simply put, the hash index uses a certain hash algorithm to convert the key value into a new hash value. The search does not need to be searched from the root node to the leaf node step by step like a B+ tree. Only one hash algorithm is needed. You can immediately locate the corresponding location, which is very fast.

The difference between B+ tree index and hash index

If it is an equivalence query, then the hash index obviously has an absolute advantage , because only one algorithm is needed to find the corresponding key value; of course, the premise is that the key value is unique. If the key value is not unique, you need to find the location of the key first, and then scan backward according to the linked list until you find the corresponding data.
If it is a range query retrieval, this time the hash index is useless , because the original orderly key value, after the hash algorithm, may become discontinuous, there is no way to use the index to complete the scope Query retrieval.
In the same way, the hash index can’t use the index to complete the sorting, and the partial fuzzy query like ‘xxx%’ (this partial fuzzy query is actually a range query in essence).
Hash indexes also do not support the leftmost matching rule for multicolumn joint indexes .
The keyword retrieval efficiency of the B+ tree index is relatively average, and the fluctuation range is not as large as that of the B-tree. In the case of a large number of repeated key values, the efficiency of the hash index is extremely low because there is a so-called hash collision problem.

What is the difference between a Homemaker and a Housewife?
2024-10-07
Bitmap Indexing in DBMS Bitmap Index vs. B-tree Index low cardinality
2024-12-04
Index
淺析oracle b-tree index搜尋原理
2018-06-27
OracleIndex
PostgreSQL DBA(139) - PG 12（B-tree index improvement 1#）
2019-12-04
SQLIndex
What are general rules when deciding on index?
2022-02-07
Index
PostgreSQL DBA(43) - Index(Hash)
2019-06-26
SQLIndex
[LeetCode] 2903. Find Indices With Index and Value Difference I
2024-05-26
LeetCodeIndex
PostgreSQL DBA(56) - Why does checkpointer use so much memory
2019-07-16
SQL
十四、Mysql之B-Tree
2018-12-11
MySql
【MySQL（1）| B-tree和B+tree】
2019-02-16
MySql
Difference between cursor and a ref cursor
2019-06-06
z-index does not work in Internet Explorer with pdf in iframe
2019-02-16
Index
MySQL探索(一):B-Tree索引
2018-07-28
MySql索引
[20211231]ORA-01418 specified index does not exist.txt
2021-12-31
Index
The SQL vs NoSQL Difference: MySQL vs MongoDB
2019-04-13
MySqlMongoDB
What is the difference Put and Post and Get?
2020-10-30
B-tree
2024-07-20
搞懂MySQL InnoDB B+樹索引
2019-03-16
MySql索引
mysql 函式substring_index()
2018-06-06
MySql函式Index
【MySQL】Merge Index導致死鎖
2020-07-07
MySqlIndex
MySQL 索引覆蓋（Covering Index)
2024-11-28
MySql索引Index
【Mysql】InnoDB 中的 B+ 樹索引
2021-07-26
MySql索引
B-tree索引
2018-05-29
索引
What does -> do in clojure?
2019-01-15
MySQL explain結果Extra中"Using Index"與"Using where; Using index"區別
2022-03-01
MySqlAIIndex
【mysql】SUBSTRING_INDEX 用法舉例
2024-08-29
MySqlIndex
Difference Between Arraylist And Vector : Core Java Interview Collection Question
2019-01-08
JavaView
ValueError: Length of values (141) does not match length of index (4278)問題的解決
2024-03-06
ErrorIndex
What does "xargs grep" do?
2024-04-26
MySQL Index Condition Pushdown(ICP)的使用限制
2018-11-27
MySqlIndex
Index of /virtualbox
2024-08-30
Index
PostgreSQL：INDEX
2020-12-14
SQLIndex
PostgreSQL的B-tree索引
2019-06-06
SQL索引
oracle invisible index與unusable index的區別
2020-04-23
OracleIndex
多路查詢樹:B-tree/b+tree
2020-09-30
Python, pandas: how to sort dataframe by index// Merge two dataframes by index
2018-11-22
PythonIndex
MySQL 8.0新特性-倒敘索引 desc index
2020-04-12
MySql索引Index
簡單談談MySQL的loose index scan
2021-09-09
MySqlIndex