A Brief Introduce of Database Index（索引簡介）

dykin發表於2016-12-01

這是我寫的一篇研究生申請的writing sample，關於資料庫索引的，對關係型資料庫的索引從資料結構到用途再到作用物件簡單分析了一下，因為我發現在實際工作中，index是個好東西，但是很多DBA並不能找到合適的index使用，這樣會使查詢效率提高得不大，甚至影響查詢效率。所以這篇文章希望幫助大家理解一下index。我沒有在演算法上展開太多，因為很多index演算法相似，如果展開容易喧賓奪主。等過段時間我會加入一些實驗資料和程式碼進這篇文章，最近比較懶不想安裝資料庫.DB2安裝實在太煩了。

ABSTRACT

Index is an extremely important item for a database system. The purpose of this article is to explore the concept and function of the index, how the index is able to improve the speed of retrieve data significantly, and help database administrator distinct and choose the correct index in real life.

1. 1. How DBMS stores data on disk?

The relational database store data in the disk by pages, which is the minimum unit of storage. The page only stores three things: the data in the tables, indexes, and execution plans. No matter when the database receiving a query request, RDBMS must load the page to memory first. Once the page complete working, it won’t discard immediately. DBMS will move the page to buffer pool, which is cache table and index data from disk for next time.
As we know, data is stored on disk-based storage devices, and it is stored as blocks of data. The structure of disk block as much as linked list, both of them contain a data section and a pointer (or link) for the next block. The fact is multiple records can only be stored one field, if we use the linear search to retrieve a record will require N/2 block accesses on average. If the field doesn’t contain unique entries, then the entire time cost is Nm which means you have to search the entire disk [1].

2. What is index?

As we mentioned early, DBMS stores index in the page. Index is a data structure which could significantly improve the speed of data retrieval operations. Why the index improve the retrieve speed effectively? We have to talk about the index algorithm and data structure first [2].
Different DBMS provide different types of index, but majority database index designs exhibit logarithmic (O (log (N))) retrieval performance. Generally speaking, there are some different indexes based on the different data structures. Indexes could be implemented by different kinds of data structure such as balance trees, B+ trees hash table, R+ trees.

3. Index architecture

We can distinguish the index architecture by the rows’ order
1) Non-clustered index
In a non-clustered index, the physical order of the rows is not the same as the index order, which means the data located in the disk has the arbitrary order, but logical ordering is specified by the index order.
2) Clustered index
Clustered index will change the data block into a specific index order to match the index. As the result, the row data will be stored in the index order. Therefore, we can create only one clustered index in a specific database table.

4. Index Implementations

1) Bitmap Index

Bitmap index is designed by a B-tree data structure, a self-balancing binary search tree, to retrieve data from database. So many facts indicate bitmap index working perfect for low-cardinality columns, which include a meaningful quantity of distinct values, no matter absolutely or relative to the number of records which contain the data. [3] Bitmap index uses bit array, which segment only has two value, 0 and 1 (False or True). Due to the values of column are distinct, each value could be presented by 1 or 0, standing for the value included or not. Therefore, bit map index has a significant space and performance over other index structures for retrieving such data [4].

2) Dense index and sparse index

Dense index and sparse index are very similar. Dense index contains the search key value and pointer for each record in the data file, thus dense index record could be very large. While sparse index contains the search key value and pointer only for each data block in the data file, due to the data is sorted or ordered, so sparse index only need to point to the head of each data block. Relatively speaking, the sparse index record is smaller than dense index record [5].

3) Reverse index

Reverse index uses B-tree structure, which reverses the key value before inserting in the index. Reverse index is very effective for indexing sequence numbers data, because each key value is highly greater than the prior value. For instance, 29, 30, 31, all these three keys will be in the same block possible. If we use reverse index to query them as 92, 03, 13, they may be distributed at the different blocks. Due to the B+ tree structure, so we can query them faster than the sequence keys [6].
In addition, there are some other index methods such as R+ tree, which is used by Google map, to retrieve the new type data as times go by. Finally, it is good to research deeply about index to improve database performance.

Works Cited：

[1] Garcia-Molina, Hector, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Upper Saddle River, NJ: Prentice Hall, 2002. Print.

[2] Fritchey, Grant. “Statistics, Data Distribution, and Cardinality.” SQL Server Query Performance Tuning (2014): 193-235. Web.

[3] “Bitmap Index vs. B-tree Index: Which and When?” Bitmap Index vs. B-tree Index: Which and When? N.p., n.d. Web. 26 Nov. 2016.

[4] Fujioka, Kengo, Yukio Uematsu, and Makoto Onizuka. “Application of Bitmap Index to Information Retrieval.” Proceeding of the 17th International Conference on World Wide Web – WWW `08 (2008): n. pag. Web.

[5] “Dense and Sparse Indices.” Dense and Sparse Indices. http://www.cs.sfu.ca/CourseCe… Web. 30 Nov. 2016.

[6] “Introduction To Reverse Key Indexes: Part I.” Richard Foote`s Oracle Blog. N.p., 2014. Web. 26 Nov. 2016.

【INDEX】Postgresql索引介紹
2021-03-25
IndexSQL索引
SELinux Introduce - SELinux技術簡介(轉)
2007-08-16
Linux
Python 精靈模組簡介_python sprites module introduce
2019-12-31
Python
Oracle NOSQL DATABASE簡介
2011-12-26
OracleSQLDatabase
常用索引簡介
2019-04-02
索引
【Mysql】索引簡介
2019-07-11
MySql索引
MySQL 索引簡介
2022-04-16
MySql索引
MySQL索引簡介
2018-12-02
MySql索引
介紹Oracle Virtual Index虛擬索引（上）
2013-08-26
OracleIndex索引
index索引
2014-04-24
Index索引
MongoDB之索引（簡介）
2017-08-24
MongoDB索引
Brief description of Oracle physical standby database configuration and management
2007-07-13
OracleDatabase
Brief description of Oracle physical standby database configuration and manageme
2007-07-13
OracleDatabase
分割槽索引之本地(local index)索引和全域性索引(global index)
2017-11-29
索引Index
Elasticsearch之索引模板index template與索引別名index alias
2019-04-08
Elasticsearch索引Index
國產資料庫OushuDB（Database）簡介
2021-11-18
資料庫Database
Mysql——index(索引)使用
2016-09-30
MySqlIndex索引
oracle index索引原理
2017-12-22
OracleIndex索引
mysql 索引( mysql index )
2014-10-10
MySql索引Index
MySQL資料庫索引簡介
2018-03-16
MySql資料庫索引
PostgreSQL中索引與CTE簡介
2024-06-04
SQL索引
MySQL 建表DATA DIRECTORY 、INDEX DIRECTORY 簡介
2016-04-29
MySqlIndex
點陣圖索引（Bitmap Index）——索引共用
2017-07-08
索引Index
Azure SQL Database Active Geo-Replication 簡介
2017-05-18
SQLDatabase
Relational Database Index Design and the Optimizers
2013-05-08
DatabaseIndex
jQuery的index()函式用法簡單介紹
2017-03-27
jQueryIndex函式
pandas（3）：索引Index/MultiIndex
2021-04-20
索引Index
唯一性索引（Unique Index）與普通索引（Normal Index）差異（上）
2011-06-16
索引IndexORM
唯一性索引（Unique Index）與普通索引（Normal Index）差異（中）
2011-06-17
索引IndexORM
唯一性索引（Unique Index）與普通索引（Normal Index）差異（下）
2011-06-18
索引IndexORM
create index , rebuild index troubleshooting 索引故障解決
2010-10-27
IndexRebuild索引
分析index降低索引層次
2017-07-07
Index索引
index merge合併索引
2017-04-24
Index索引
Oracle索引分裂（Index Block Split）
2017-05-17
Oracle索引IndexBloC
索引優化index skip scan
2010-12-08
索引優化Index
Oracle筆記之索引(index)
2012-04-10
Oracle筆記索引Index
oracle index索引結構（一）
2013-03-15
OracleIndex索引
oracle dml與索引index(一)
2013-03-10
Oracle索引Index