MongoDB複合索引詳解

Fundebug發表於2018-03-23

原文網址 : https://juejin.im/post/5ab4ed41518825556f555eba

摘要： 對於MongoDB的多鍵查詢，建立複合索引可以有效提高效能。

什麼是複合索引？

複合索引，即Compound Index，指的是將多個鍵組合到一起建立索引，這樣可以加速匹配多個鍵的查詢。不妨通過一個簡單的示例理解複合索引。

students集合如下：

db.students.find().pretty()
{
	"_id" : ObjectId("5aa7390ca5be7272a99b042a"),
	"name" : "zhang",
	"age" : "15"
}
{
	"_id" : ObjectId("5aa7393ba5be7272a99b042b"),
	"name" : "wang",
	"age" : "15"
}
{
	"_id" : ObjectId("5aa7393ba5be7272a99b042c"),
	"name" : "zhang",
	"age" : "14"
}
複製程式碼

在name和age兩個鍵分別建立了索引(_id自帶索引)：

db.students.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"name" : 1
		},
		"name" : "name_1",
		"ns" : "test.students"
	},
	{
		"v" : 1,
		"key" : {
			"age" : 1
		},
		"name" : "age_1",
		"ns" : "test.students"
	}
]
複製程式碼

當進行多鍵查詢時，可以通過explian()分析執行情況(結果僅保留winningPlan)：

db.students.find({name:"zhang",age:"14"}).explain()
"winningPlan":
{
    "stage": "FETCH",
    "filter":
    {
        "name":
        {
            "$eq": "zhang"
        }
    },
    "inputStage":
    {
        "stage": "IXSCAN",
        "keyPattern":
        {
            "age": 1
        },
        "indexName": "age_1",
        "isMultiKey": false,
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 1,
        "direction": "forward",
        "indexBounds":
        {
            "age": [
                "[\"14\", \"14\"]"
            ]
        }
    }
}
複製程式碼

由winningPlan可知，這個查詢依次分為IXSCAN和FETCH兩個階段。IXSCAN即索引掃描，使用的是age索引；FETCH即根據索引去查詢文件，查詢的時候需要使用name進行過濾。

為name和age建立複合索引：

db.students.createIndex({name:1,age:1})

db.students.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"name" : 1,
			"age" : 1
		},
		"name" : "name_1_age_1",
		"ns" : "test.students"
	}
]
複製程式碼

有了複合索引之後，同一個查詢的執行方式就不同了：

db.students.find({name:"zhang",age:"14"}).explain()
"winningPlan":
{
    "stage": "FETCH",
    "inputStage":
    {
        "stage": "IXSCAN",
        "keyPattern":
        {
            "name": 1,
            "age": 1
        },
        "indexName": "name_1_age_1",
        "isMultiKey": false,
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 1,
        "direction": "forward",
        "indexBounds":
        {
            "name": [
                "[\"zhang\", \"zhang\"]"
            ],
            "age": [
                "[\"14\", \"14\"]"
            ]
        }
    }
}
複製程式碼

由winningPlan可知，這個查詢的順序沒有變化，依次分為IXSCAN和FETCH兩個階段。但是，IXSCAN使用的是name與age的複合索引；FETCH即根據索引去查詢文件，不需要過濾。

這個示例的資料量太小，並不能看出什麼問題。但是實際上，當資料量很大，IXSCAN返回的索引比較多時，FETCH時進行過濾將非常耗時。接下來將介紹一個真實的案例。

定位MongoDB效能問題

隨著接收的錯誤資料不斷增加，我們Fundebug已經累計處理3.5億錯誤事件，這給我們的服務不斷帶來效能方面的挑戰，尤其對於MongoDB叢集來說。

對於生產資料庫，配置profile，可以記錄MongoDB的效能資料。執行以下命令，則所有超過1s的資料庫讀寫操作都會被記錄下來。

db.setProfilingLevel(1,1000)
複製程式碼

查詢profile所記錄的資料，會發現events集合的某個查詢非常慢：

db.system.profile.find().pretty()
{
	"op" : "command",
	"ns" : "fundebug.events",
	"command" : {
		"count" : "events",
		"query" : {
			"createAt" : {
				"$lt" : ISODate("2018-02-05T20:30:00.073Z")
			},
			"projectId" : ObjectId("58211791ea2640000c7a3fe6")
		}
	},
	"keyUpdates" : 0,
	"writeConflicts" : 0,
	"numYield" : 1414,
	"locks" : {
		"Global" : {
			"acquireCount" : {
				"r" : NumberLong(2830)
			}
		},
		"Database" : {
			"acquireCount" : {
				"r" : NumberLong(1415)
			}
		},
		"Collection" : {
			"acquireCount" : {
				"r" : NumberLong(1415)
			}
		}
	},
	"responseLength" : 62,
	"protocol" : "op_query",
	"millis" : 28521,
	"execStats" : {

	},
	"ts" : ISODate("2018-03-07T20:30:59.440Z"),
	"client" : "192.168.59.226",
	"allUsers" : [ ],
	"user" : ""
}
複製程式碼

events集合中有數億個文件，因此count操作比較慢也不算太意外。根據profile資料，這個查詢耗時28.5s，時間長得有點離譜。另外，numYield高達1414，這應該就是操作如此之慢的直接原因。根據MongoDB文件，numYield的含義是這樣的：

The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation.

這就意味著大量時間消耗在讀取硬碟上，且讀了非常多次。可以推測，應該是索引的問題導致的。

不妨使用explian()來分析一下這個查詢(僅保留executionStats)：

db.events.explain("executionStats").count({"projectId" : ObjectId("58211791ea2640000c7a3fe6"),createAt:{"$lt" : ISODate("2018-02-05T20:30:00.073Z")}})
"executionStats":
{
    "executionSuccess": true,
    "nReturned": 20853,
    "executionTimeMillis": 28055,
    "totalKeysExamined": 28338,
    "totalDocsExamined": 28338,
    "executionStages":
    {
        "stage": "FETCH",
        "filter":
        {
            "createAt":
            {
                "$lt": ISODate("2018-02-05T20:30:00.073Z")
            }
        },
        "nReturned": 20853,
        "executionTimeMillisEstimate": 27815,
        "works": 28339,
        "advanced": 20853,
        "needTime": 7485,
        "needYield": 0,
        "saveState": 1387,
        "restoreState": 1387,
        "isEOF": 1,
        "invalidates": 0,
        "docsExamined": 28338,
        "alreadyHasObj": 0,
        "inputStage":
        {
            "stage": "IXSCAN",
            "nReturned": 28338,
            "executionTimeMillisEstimate": 30,
            "works": 28339,
            "advanced": 28338,
            "needTime": 0,
            "needYield": 0,
            "saveState": 1387,
            "restoreState": 1387,
            "isEOF": 1,
            "invalidates": 0,
            "keyPattern":
            {
                "projectId": 1
            },
            "indexName": "projectId_1",
            "isMultiKey": false,
            "isUnique": false,
            "isSparse": false,
            "isPartial": false,
            "indexVersion": 1,
            "direction": "forward",
            "indexBounds":
            {
                "projectId": [
                    "[ObjectId('58211791ea2640000c7a3fe6'), ObjectId('58211791ea2640000c7a3fe6')]"
                ]
            },
            "keysExamined": 28338,
            "dupsTested": 0,
            "dupsDropped": 0,
            "seenInvalidated": 0
        }
    }
}
複製程式碼

可知，events集合並沒有為projectId與createAt建立複合索引，因此IXSCAN階段採用的是projectId索引，其nReturned為28338; FETCH階段需要根據createAt進行過濾，其nReturned為20853，過濾掉了7485個文件；另外，IXSCAN與FETCH階段的executionTimeMillisEstimate分別為30ms和27815ms，因此基本上所有時間都消耗在了FETCH階段，這應該是讀取硬碟導致的。

建立複合索引

沒有為projectId和createAt建立複合索引是個尷尬的錯誤，趕緊補救一下：

db.events.createIndex({projectId:1,createTime:-1},{background: true})
複製程式碼

在生產環境構建索引這種事最好是晚上做，這個命令一共花了大概7個小時吧！background設為true，指的是不要阻塞資料庫的其他操作，保證資料庫的可用性。但是，這個命令會一直佔用著終端，這時不能使用CTRL + C，否則會終止索引構建過程。

複合索引建立成果之後，前文的查詢就快了很多(僅保留executionStats)：

db.javascriptevents.explain("executionStats").count({"projectId" : ObjectId("58211791ea2640000c7a3fe6"),createAt:{"$lt" : ISODate("2018-02-05T20:30:00.073Z")}})
"executionStats":
{
    "executionSuccess": true,
    "nReturned": 0,
    "executionTimeMillis": 47,
    "totalKeysExamined": 20854,
    "totalDocsExamined": 0,
    "executionStages":
    {
        "stage": "COUNT",
        "nReturned": 0,
        "executionTimeMillisEstimate": 50,
        "works": 20854,
        "advanced": 0,
        "needTime": 20853,
        "needYield": 0,
        "saveState": 162,
        "restoreState": 162,
        "isEOF": 1,
        "invalidates": 0,
        "nCounted": 20853,
        "nSkipped": 0,
        "inputStage":
        {
            "stage": "COUNT_SCAN",
            "nReturned": 20853,
            "executionTimeMillisEstimate": 50,
            "works": 20854,
            "advanced": 20853,
            "needTime": 0,
            "needYield": 0,
            "saveState": 162,
            "restoreState": 162,
            "isEOF": 1,
            "invalidates": 0,
            "keysExamined": 20854,
            "keyPattern":
            {
                "projectId": 1,
                "createAt": -1
            },
            "indexName": "projectId_1_createTime_-1",
            "isMultiKey": false,
            "isUnique": false,
            "isSparse": false,
            "isPartial": false,
            "indexVersion": 1
        }
    }
}
複製程式碼

可知，count操作使用了projectId和createAt的複合索引，因此非常快，只花了46ms，效能提升了將近**600倍！！！**對比使用複合索引前後的結果，發現totalDocsExamined從28338降到了0,表示使用複合索引之後不再需要去查詢文件，只需要掃描索引就好了，這樣就不需要去訪問磁碟了，自然快了很多。

參考

版權宣告: 轉載時請註明作者Fundebug以及本文地址： https://blog.fundebug.com/2018/03/15/mongdb_compound_index_detail/

MongoDB中複合索引結構
2020-10-23
MongoDB索引
MongoDB索引優化詳解
2019-02-23
MongoDB索引優化
快速掌握mongoDB(三)——mongoDB的索引詳解
2019-07-19
MongoDB索引
MongoDB索引與優化詳解
2019-04-27
MongoDB索引優化
「生產事故」MongoDB複合索引引發的災難
2020-12-09
MongoDB索引
MySQL複合索引
2024-06-29
MySql索引
MySQL複合索引探究
2021-02-03
MySql索引
DataFrame刪除複合索引
2024-05-03
索引
oracle複合索引介紹(多欄位索引)
2019-05-10
Oracle索引
MongoDB 索引
2021-07-29
MongoDB索引
MySQL索引詳解
2019-05-14
MySql索引
Postgres索引詳解
2022-02-04
索引
InnoDB 索引詳解
2021-11-21
索引
mongodb索引使用
2019-04-02
MongoDB索引
mongoDB的索引
2018-05-15
MongoDB索引
SpringBoot 整合 Spring Data Mongodb 操作 MongoDB 詳解
2021-08-30
Spring BootMongoDB
萬字詳解，吃透 MongoDB！
2023-01-29
MongoDB
MongoDB索引，效能分析
2018-09-18
MongoDB索引
資料庫索引：綜合詳細指南
2024-05-25
資料庫索引
mongodb和nodejs mongoose使用詳解
2019-02-16
MongoDBNodeJS
mongodb建立索引和刪除索引和背景索引background
2024-05-01
MongoDB索引
MongoDB （五）高階_索引
2018-04-07
MongoDB索引
mongodb 如何檢視索引
2021-09-11
MongoDB索引
MYSQL學習(三) --索引詳解
2020-11-20
MySql索引
CentOS 7快速安裝Mongodb詳解
2021-09-23
CentOSMongoDB
【MongoDB學習筆記】MongoDB索引那點事
2022-01-05
MongoDB筆記索引
MongoDB中的定時索引
2019-08-01
MongoDB索引
005.MongoDB索引及聚合
2019-06-05
MongoDB索引
MongoDB慢查詢與索引
2022-07-16
MongoDB索引
MongoDB索引的簡單理解
2021-09-14
MongoDB索引
mongodb複製集
2020-04-05
MongoDB
Elasticsearch 索引的對映配置詳解
2018-08-12
Elasticsearch索引
IO多路複用詳解
2021-08-04
【Mongo】MongoDB索引管理－索引的建立、檢視、刪除
2018-05-01
MongoDB索引
快速解決mongodb出現id重複問題
2022-12-10
MongoDB
mongodb資料庫如何建立索引？
2021-09-11
MongoDB資料庫索引
MongoDB 搭建複製集
2020-09-27
MongoDB
MongoDB 複製機制
2018-03-23
MongoDB

MongoDB複合索引詳解

什麼是複合索引？

定位MongoDB效能問題

建立複合索引

參考

相關文章