這是我寫的一篇研究生申請的writing sample,關於資料庫索引的,對關係型資料庫的索引從資料結構到用途再到作用物件簡單分析了一下,因為我發現在實際工作中,index是個好東西,但是很多DBA並不能找到合適的index使用,這樣會使查詢效率提高得不大,甚至影響查詢效率。所以這篇文章希望幫助大家理解一下index。我沒有在演算法上展開太多,因為很多index演算法相似,如果展開容易喧賓奪主。等過段時間我會加入一些實驗資料和程式碼進這篇文章,最近比較懶不想安裝資料庫.DB2安裝實在太煩了。
ABSTRACT
Index is an extremely important item for a database system. The purpose of this article is to explore the concept and function of the index, how the index is able to improve the speed of retrieve data significantly, and help database administrator distinct and choose the correct index in real life.
1. 1. How DBMS stores data on disk?
The relational database store data in the disk by pages, which is the minimum unit of storage. The page only stores three things: the data in the tables, indexes, and execution plans. No matter when the database receiving a query request, RDBMS must load the page to memory first. Once the page complete working, it won’t discard immediately. DBMS will move the page to buffer pool, which is cache table and index data from disk for next time.
As we know, data is stored on disk-based storage devices, and it is stored as blocks of data. The structure of disk block as much as linked list, both of them contain a data section and a pointer (or link) for the next block. The fact is multiple records can only be stored one field, if we use the linear search to retrieve a record will require N/2 block accesses on average. If the field doesn’t contain unique entries, then the entire time cost is Nm which means you have to search the entire disk [1].
2. What is index?
As we mentioned early, DBMS stores index in the page. Index is a data structure which could significantly improve the speed of data retrieval operations. Why the index improve the retrieve speed effectively? We have to talk about the index algorithm and data structure first [2].
Different DBMS provide different types of index, but majority database index designs exhibit logarithmic (O (log (N))) retrieval performance. Generally speaking, there are some different indexes based on the different data structures. Indexes could be implemented by different kinds of data structure such as balance trees, B+ trees hash table, R+ trees.
3. Index architecture
We can distinguish the index architecture by the rows’ order
1) Non-clustered index
In a non-clustered index, the physical order of the rows is not the same as the index order, which means the data located in the disk has the arbitrary order, but logical ordering is specified by the index order.
2) Clustered index
Clustered index will change the data block into a specific index order to match the index. As the result, the row data will be stored in the index order. Therefore, we can create only one clustered index in a specific database table.
4. Index Implementations
1) Bitmap Index
Bitmap index is designed by a B-tree data structure, a self-balancing binary search tree, to retrieve data from database. So many facts indicate bitmap index working perfect for low-cardinality columns, which include a meaningful quantity of distinct values, no matter absolutely or relative to the number of records which contain the data. [3] Bitmap index uses bit array, which segment only has two value, 0 and 1 (False or True). Due to the values of column are distinct, each value could be presented by 1 or 0, standing for the value included or not. Therefore, bit map index has a significant space and performance over other index structures for retrieving such data [4].
2) Dense index and sparse index
Dense index and sparse index are very similar. Dense index contains the search key value and pointer for each record in the data file, thus dense index record could be very large. While sparse index contains the search key value and pointer only for each data block in the data file, due to the data is sorted or ordered, so sparse index only need to point to the head of each data block. Relatively speaking, the sparse index record is smaller than dense index record [5].
3) Reverse index
Reverse index uses B-tree structure, which reverses the key value before inserting in the index. Reverse index is very effective for indexing sequence numbers data, because each key value is highly greater than the prior value. For instance, 29, 30, 31, all these three keys will be in the same block possible. If we use reverse index to query them as 92, 03, 13, they may be distributed at the different blocks. Due to the B+ tree structure, so we can query them faster than the sequence keys [6].
In addition, there are some other index methods such as R+ tree, which is used by Google map, to retrieve the new type data as times go by. Finally, it is good to research deeply about index to improve database performance.
Works Cited:
[1] Garcia-Molina, Hector, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Upper Saddle River, NJ: Prentice Hall, 2002. Print.
[2] Fritchey, Grant. “Statistics, Data Distribution, and Cardinality.” SQL Server Query Performance Tuning (2014): 193-235. Web.
[3] “Bitmap Index vs. B-tree Index: Which and When?” Bitmap Index vs. B-tree Index: Which and When? N.p., n.d. Web. 26 Nov. 2016.
[4] Fujioka, Kengo, Yukio Uematsu, and Makoto Onizuka. “Application of Bitmap Index to Information Retrieval.” Proceeding of the 17th International Conference on World Wide Web – WWW `08 (2008): n. pag. Web.
[5] “Dense and Sparse Indices.” Dense and Sparse Indices. http://www.cs.sfu.ca/CourseCe… Web. 30 Nov. 2016.
[6] “Introduction To Reverse Key Indexes: Part I.” Richard Foote`s Oracle Blog. N.p., 2014. Web. 26 Nov. 2016.