SQLServer利用HashKey計算列解決寬欄位查詢的效能問題

風移發表於2016-10-18

SQL Server利用HashKey計算列解決寬欄位查詢的效能問題

主人翁

本文主人翁：MSSQL菜鳥和MSSQL老鳥。

問題提出

某年某月某日，某MSSQL菜鳥滿臉愁容的跑到老鳥跟前，心灰意懶的對老鳥說“我最近遇到一個問題，很大的問題，對，非常大的問題”。老鳥不急不慢的推了推2000度超級近視眼鏡框，慢吞吞的說：“說來聽聽”。

“我有一個100萬資料量的表，有一個寬度為7500欄位，不幸的是現在我需要根據這個欄位的值來查詢表資料，而且最為可恨的是MSSQL Server不允許我在這個欄位上建立Index，所以，我的查詢語句爆慢，應用程式直接超時，腫麼辦呀，腫麼辦？”。

問題分析

老鳥一聽，捋了捋一身上老毛，頭頭是道的分析說：“查詢慢，是正常的，快起來才不正常呢。你想想啊，欄位寬度為7500，顯然這個欄位不能建立索引了，因為MSSQL限制建立索引的條件是鍵值寬度不超過900byte，100萬的資料量沒有索引的查詢跑起來IO立馬上起來了，效能瓶頸是理所應當的。”

“那要怎麼解決啊？”，菜鳥已經心急如焚了。

老鳥接著問：“你知道Hash Join的原理嗎？Hash Join就是將兩個表的連線欄位先算出Hash值，然後再利用Hash值來做連線操作的，對吧？”

“我知道Hash Join的原理啊，和解決這個問題有什麼關係？”，菜鳥已經迫不及待了。

“我們完全可以借用這個思想嘛，我們可以先建立一個計算列，這個計算列儲存著寬欄位的Hash值，然後在這個Hash值上面建立索引。在查詢的時候，我們直接使用Hash來檢索滿足條件的記錄，換句話講，只要Hash值滿足條件，能夠匹配上，對應的寬欄位也就滿足條件了嘛。”，老鳥像教育孩子似的教育著菜鳥。

“喔~~？哦~？”，菜鳥還是似懂非懂。老鳥看出了菜鳥的心思，於是得意洋洋的說：“來來來，讓我們一起來看看Demo吧”。

解決問題

於是老鳥洋洋灑灑的寫了一段測試Demo:

建立測試表

use tempdb
go

--Create Test table
if OBJECT_ID(`dbo.test_for_hashkey`,`U`) is not null
    drop table dbo.test_for_hashkey
GO
create table dbo.test_for_hashkey
(
    id int identity(1,1) primary key
    ,SearchKeyword varchar(7500) null
);
/*
We can`t create index on the column SearchKeyword since the maximum key length has 900 bytes limitation.

create index ix_DBA_SearchKeyword
ON dbo.test_for_hashkey(SearchKeyword);
GO
*/

初始化100萬條資料

--1 million records data init
SET NOCOUNT ON
declare
    @loop int
    ,@do int
    ,@SearchKeyword varchar(7500)
;

select
    @loop = 1000000
    ,@do = 0
;

while @do < @loop
begin
    set
        @SearchKeyword = REPLICATE(newid(),220)
    ;
    insert into dbo.test_for_hashkey
    select @SearchKeyword
    ;
    set @do = @do + 1
end
go

菜鳥的查詢方法效能

--performance testing at the very first time for the regular query
declare
    @SearchKeyword varchar(7500)
;
select TOP 1
    @SearchKeyword = SearchKeyword
FROM dbo.test_for_hashkey WITH(NOLOCK)
where id = 59987;

SET STATISTICS TIME ON
SET STATISTICS IO ON
select *
FROM dbo.test_for_hashkey WITH(NOLOCK) 
where SearchKeyword = @SearchKeyword
;

/* cold cache
Table `test_for_hashkey`. Scan count 5, logical reads 1003732, physical reads 6792, read-ahead reads 987055, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2870 ms,  elapsed time = 6213 ms.
*/

從註釋部分的效能指標來看，菜鳥的查詢方法效能的確如老鳥所說，IO消耗非常嚴重，邏輯讀達到了100萬，物理讀達到了6792；時間CPU 2870毫秒和時間消耗6213毫秒還不算太嚴重（因為我的測試環境是SSD的儲存介質）。
老鳥的優化方案：先新增計算列，記得為計算列使用PERSISTED關鍵字，然後在計算列上建立索引。

--and now, it`s time for us to do something for booting the query
ALTER TABLE dbo.test_for_hashkey
ADD SearchKeyword_hashkey AS checksum(SearchKeyword) PERSISTED
;
GO
CREATE INDEX IX_SearchKeyword_hashkey ON dbo.test_for_hashkey(SearchKeyword_hashkey);
GO

檢驗老鳥優化方案

--test again to observe the performance metrics
declare
    @SearchKeyword varchar(7500)
    , @SearchKeyword_hashkey int
    ;
select TOP 1
    @SearchKeyword_hashkey = CHECKSUM(SearchKeyword)
    , @SearchKeyword = SearchKeyword
FROM dbo.test_for_hashkey WITH(NOLOCK)
where id = 59987;

select *
FROM dbo.test_for_hashkey WITH(NOLOCK) 
where SearchKeyword_hashkey = @SearchKeyword_hashkey
--to avoid hash key collisions, we`d better add this condition statement
and SearchKeyword = @SearchKeyword
;
/*
Table `test_for_hashkey`. Scan count 1, logical reads 7, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

*/

從註釋部分的效能指標來看，老鳥的優化方案的確棒棒的，邏輯讀降低到7，物理讀降低都1；CPU和執行時間消耗均為0毫秒，也就是秒殺，效能取得了質的飛躍。

同時，從老鳥優化方案的執行計劃來看，的確走到了這個有效的索引上來:
Hash01

注意事項

看完優化效果後，菜鳥已經激動得不能自已：“牛X，老鳥就是老鳥，請收下我的膝蓋吧，今生今世為你做牛做馬”。

老鳥摸了摸菜鳥腦袋，語重心長的說：“千萬不要高興得太早，這個方法雖然效果很棒，但是有兩個需要注意的點”。

一、為了防止Hash碰撞，我們最好在WHERE語句中加上防止Hash碰撞的程式碼

--to avoid hash key collisions, we`d better add this condition statement
and SearchKeyword = @SearchKeyword

二、這個方法只適合於字串全部匹配的情況，對應字串部分模糊和全部模糊匹配並不適合。

[20210208]lob欄位與查詢的問題.txt
2021-02-08
Native for sql 查詢列欄位非命令列形式
2018-03-21
SQL命令列
巧用 Base62 解決欄位太短的問題
2022-01-23
利用PCT解決快速重新整理效能問題
2019-04-03
同一欄位多個查詢條件時遇到的一個問題
2019-11-29
SQLServer查詢哪些索引利用率低
2022-06-20
SQLServer索引
MyBatis 解決欄位名不一致的問題
2020-06-19
MyBatis
Laravel 查詢資料庫欄位內容是 Json 陣列時的查詢語句
2021-09-26
Laravel資料庫JSON陣列
在 with 查詢中只查詢個別欄位
2018-09-19
MySql 查詢某一天日期格式欄位走索引問題
2024-03-28
MySql索引
Elasticsearch 複合查詢——多字串多欄位查詢
2021-03-14
Elasticsearch字串
一種hive的模型設計思路，解決頻繁增加指標欄位的問題
2020-12-08
Hive模型指標
解決mybatis用Map返回的欄位全變大寫的問題
2024-10-18
MyBatis
leetcode題解(查詢表問題)
2019-02-28
LeetCode
ArcGIS對欄位分割查詢操作
2020-10-19
查詢oracle欄位預設值
2024-08-21
Oracle
Elasticsearch 單字串多欄位查詢
2021-03-15
Elasticsearch字串
位運算解決多標籤問題【原創】
2019-05-19
MySQL：查詢欄位數量多少對查詢效率的影響
2019-12-13
MySql
MongoDB（13）- 查詢操作返回指定的欄位
2021-06-02
MongoDB
優化-mysql子查詢索引失效問題解決
2020-11-05
優化MySql索引
arcgis欄位值計算（擷取A欄位前8位+按照順序計算8位）
2024-06-28
如何解決報表關聯計算中的效能問題
2019-12-17
如何巧妙著運用「位運算」來解決問題？
2019-03-06
模型聯合查詢返回指定欄位
2021-06-04
模型
JavaScript中解決計算精度丟失的問題
2018-08-26
JavaScript
關於日期及時間欄位的查詢
2020-06-09
Tableau的計算欄位、粒度、聚合、比率、表計算
2020-12-30
解決 mybatis一對多分頁問題聯級查詢
2020-09-25
MyBatis
Oracle 計算欄位選擇性判別列的索引潛力
2019-04-11
Oracle索引
MySQL 大欄位問題
2022-05-31
MySql
Oracle:sqlplus查詢出的中文是亂碼問題的解決(轉)
2020-04-04
OracleSQL
[20231020]增加欄位的問題.txt
2023-10-23
[Elasticsearch] 多欄位搜尋 (二) - 最佳欄位查詢及其調優（轉）
2018-04-08
Elasticsearch
Dynamics CRM使用計算欄位自動計算兩個時間欄位的天數差
2018-03-19
解決吞吐效能問題時的思路
2021-07-05
Redis SortedSet結構score欄位丟失精度問題解決辦法
2018-10-30
Redis
MySQL-建立計算欄位
2022-06-13
MySql
解決SqlServer執行指令碼，檔案過大，記憶體溢位問題
2020-11-26
SQLServer指令碼記憶體溢位