理解資料庫掃描方法-利用掃描方法對資料儲存進行優化

德哥發表於2018-09-15

原文網址 : https://flycode.co/archives/185366

資料庫優化

背景

假設一個黑盒中有三種水果：蘋果，香蕉、菠蘿。一共有若干個水果。

假設你需要拿10個蘋果，你需要拿多少次呢？

最差的情況，你可能需要把所有的水果都拿完。（全表掃描，掃到最後才拿到10個或者不足10個）

最好的情況，你可能10次就拿完。（全表掃描，掃10行全都是蘋果。）

PS：索引掃描這裡就不說了，因為要說的就是根據掃描方法來進行的優化。

全表掃描最好的情況優化

create table tbl (gid int, info text, crt_time timestamp);  
  
insert into tbl select random()*10000 , `test`, now() from generate_series(1,10000000);  
  
select * from tbl where gid=1 limit 10;  
  
explain (analyze,verbose,timing,costs,buffers) select * from tbl where gid=1 limit 10;  
                                                     QUERY PLAN                                                       
--------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.00..1917.62 rows=10 width=17) (actual time=0.050..11.165 rows=10 loops=1)  
   Output: gid, info, crt_time  
   Buffers: shared hit=3 read=667 dirtied=354 written=340  
   ->  Seq Scan on public.tbl  (cost=0.00..188693.39 rows=984 width=17) (actual time=0.048..11.160 rows=10 loops=1)  
         Output: gid, info, crt_time  
         Filter: (tbl.gid = 1)  
         Rows Removed by Filter: 105132  
         Buffers: shared hit=3 read=667 dirtied=354 written=340  
 Planning time: 0.078 ms  
 Execution time: 11.184 ms  
(10 rows)

儲存優化

postgres=# begin;  
BEGIN  
postgres=# create temp table tmp_tbl1 as select * from tbl where gid<>1 or gid is null;  
SELECT 9998987  
postgres=# delete from tbl where gid<>1;  
DELETE 9998987  
postgres=# end;  
COMMIT  
postgres=# vacuum full tbl;  
VACUUM  
postgres=# insert into tbl select * from tmp_tbl1 ;  
INSERT 0 9998987  
postgres=#   
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where gid=1 limit 10;  
                                                    QUERY PLAN                                                       
-------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.00..1972.60 rows=10 width=17) (actual time=0.018..0.022 rows=10 loops=1)  
   Output: gid, info, crt_time  
   Buffers: shared read=1  
   ->  Seq Scan on public.tbl  (cost=0.00..178914.70 rows=907 width=17) (actual time=0.017..0.019 rows=10 loops=1)  
         Output: gid, info, crt_time  
         Filter: (tbl.gid = 1)  
         Buffers: shared read=1  
 Planning time: 0.129 ms  
 Execution time: 0.041 ms  
(9 rows)

場景昇華 – 多表JOIN LIMIT優化

JOIN + LIMIT的場景：

通常有LIMIT的場景使用NESTLOOP JOIN效能可以比較好。

1、從外表開始掃

2、內表迴圈N次

儲存優化方法

1、外表，一開始掃描到的就是內表符合條件的資料

2、根據這種思路重新整理資料

3、檢視能耗

例子

create table a(id int, c1 int, c2 int, c3 int);  
  
create table b(id int, c1 int, c2 int, c3 int);  
  
insert into a select generate_series(1,10000000),1,1,1;  
  
insert into b select random()*100, random()*100, random()*100, random()*100 from generate_series(1,10000000);  
  
create index idx_a_1 on a(id,c1,c2,c3);  
  
create index idx_b_1 on b(c1,c2);  
  
vacuum analyze a;  
vacuum analyze b;  
  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) limit 1000;  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.87..2669.74 rows=1000 width=32) (actual time=0.081..8.266 rows=991 loops=1)  
   Output: a.id, a.c1, a.c2, a.c3, b.id, b.c1, b.c2, b.c3  
   Buffers: shared hit=3984  
   ->  Nested Loop  (cost=0.87..2723.11 rows=1020 width=32) (actual time=0.080..7.996 rows=991 loops=1)  
         Output: a.id, a.c1, a.c2, a.c3, b.id, b.c1, b.c2, b.c3  
         Buffers: shared hit=3984  
         ->  Index Scan using idx_b_1 on public.b  (cost=0.43..1136.01 rows=1020 width=16) (actual time=0.053..2.569 rows=996 loops=1)  
               Output: b.id, b.c1, b.c2, b.c3  
               Index Cond: ((b.c1 = 1) AND (b.c2 = 1))  
               Buffers: shared hit=995  
         ->  Index Only Scan using idx_a_1 on public.a  (cost=0.43..1.55 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=996)  
               Output: a.id, a.c1, a.c2, a.c3  
               Index Cond: ((a.id = b.id) AND (a.c1 = 1) AND (a.c2 = 1) AND (a.c3 = 1))  
               Heap Fetches: 0  
               Buffers: shared hit=2989  
 Planning time: 0.603 ms  
 Execution time: 8.509 ms  
(17 rows)

儲存優化

第一種可能，如果一次LOOP就可以返回1000條，那麼可以這樣優化

都使用SEQ SCAN

但是把複合條件的資料提到前面。

1、找到內表能滿足1000條以上的ID，資料提前。

2、找到與內表ID對應的資料，資料提前。

postgres=# select b.id,count(*) from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) group by 1 order by count(*) desc limit 10;  
 id | count   
----+-------  
 26 |    18  
 68 |    18  
 52 |    16  
 94 |    16  
 35 |    16  
 80 |    15  
 77 |    15  
 96 |    15  
 73 |    15  
 74 |    15  
(10 rows)  
  
  
postgres=# create table b1 as select * from b where id in (select b.id from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) group by 1 order by count(*) desc limit 1000) and b.c1=1 and b.c2=1;  
SELECT 991  
postgres=# insert into b1 select * from b where not (id in (select b.id from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) group by 1 order by count(*) desc limit 1000) and b.c1=1 and b.c2=1)  
postgres-# ;  
INSERT 0 9999009  
postgres=# alter table b rename to b2;  
ALTER TABLE  
postgres=# alter table b1 rename to b;  
ALTER TABLE

外表只需要掃描6個資料塊。

（但是注意這個方法，如果總共資料不滿足1000條，那麼會導致外表全掃）

  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) limit 991;  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..205423.04 rows=876 width=32) (actual time=0.071..7.845 rows=991 loops=1)  
   Output: a.id, a.c1, a.c2, a.c3, b.id, b.c1, b.c2, b.c3  
   Buffers: shared hit=2980  
   ->  Nested Loop  (cost=0.43..205423.04 rows=876 width=32) (actual time=0.069..7.577 rows=991 loops=1)  
         Output: a.id, a.c1, a.c2, a.c3, b.id, b.c1, b.c2, b.c3  
         Buffers: shared hit=2980  
         ->  Seq Scan on public.b  (cost=0.00..204057.62 rows=876 width=16) (actual time=0.019..0.384 rows=991 loops=1)  
               Output: b.id, b.c1, b.c2, b.c3  
               Filter: ((b.c1 = 1) AND (b.c2 = 1))  
               Buffers: shared hit=6  
         ->  Index Only Scan using idx_a_1 on public.a  (cost=0.43..1.55 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=991)  
               Output: a.c1, a.c2, a.c3, a.id  
               Index Cond: ((a.c1 = 1) AND (a.c2 = 1) AND (a.c3 = 1) AND (a.id = b.id))  
               Heap Fetches: 0  
               Buffers: shared hit=2974  
 Planning time: 0.513 ms  
 Execution time: 8.079 ms  
(17 rows)

參考

《PostgreSQL OUTER JOIN 優化的幾個知識點 – 語義轉換、記憶體頻寬、JOIN演算法、FILTER親和力、TSP、HINT、命中率、儲存順序、掃描順序、索引深度》

多執行緒掃描資料夾耗時方法分析
2020-09-11
執行緒
PostgreSQL掃描方法綜述
2019-09-22
SQL
電腦掃描檔案怎麼掃描 win10電腦掃描檔案方法介紹
2022-10-18
Win10
【技術乾貨】Oracle資料庫漏洞掃描指南
2019-08-26
Oracle資料庫
24_Oracle資料庫全表掃描詳解(四)_全表掃描生產最佳化案例三則
2020-03-14
Oracle資料庫
對上次的自動掃描進行改造
2024-03-21
yii框架，掃描目錄下檔案入資料庫
2019-02-16
框架資料庫
關係型資料庫全表掃描分片詳解
2019-08-02
資料庫
23_Oracle資料庫全表掃描詳解(三)
2020-03-14
Oracle資料庫
22_Oracle資料庫全表掃描詳解(二)
2020-03-14
Oracle資料庫
21_Oracle資料庫全表掃描詳解(一)
2020-03-14
Oracle資料庫
AWVS掃描器掃描web漏洞操作
2018-08-06
Web
全表掃描和全索引掃描
2023-02-16
索引
掃描行為分析
2024-09-19
Win10系統下掃描器程式無法掃描的解決方法
2018-10-05
Win10
掃描器的存在、奧普掃描器
2020-02-06
linux online掃描共享儲存磁碟（無需reboot）
2020-11-19
Linuxboot
微軟自帶病毒怎麼掃描_win10掃描自帶病毒的方法
2020-01-04
微軟Win10
Win10怎麼使用掃描器功能 win10使用掃描功能的方法
2022-05-15
Win10
win10系統掃描器提示掃描不到掃描器如何解決
2019-01-07
Win10
掃描器
2019-12-27
PostgreSQL模擬兩個update語句死鎖-利用掃描方法
2018-10-05
SQL
AppBoxFuture: 二級索引及索引掃描查詢資料
2019-07-24
APP索引
掃描王 for Mac專業圖片掃描工具
2020-12-04
Mac
在Linux中，什麼是埠掃描？如何使用工具如nmap進行埠掃描？
2024-04-09
Linux
Harbor倉庫映象掃描原理
2018-10-29
什麼是漏洞掃描?漏洞掃描功能有哪些?
2023-09-28
MySQL中的全表掃描和索引樹掃描
2022-05-14
MySql索引
oracle是如何進行全表掃描的
2019-05-30
Oracle
win10 安全中心關閉定期掃描方法如何關閉win10自動掃描
2020-10-24
Win10
BDA：Hadoop生態大資料工具的漏洞掃描器
2021-12-28
Hadoop大資料
python掃描埠
2018-06-14
Python
目錄掃描
2020-10-07
埠掃描器
2020-11-19
DAST 黑盒漏洞掃描器第四篇：掃描效能
2022-06-22
AST
索引掃描可能不如全表掃描的場景的理解__純粹資料量而言，不涉及CLUSTERING_FACTOR
2018-05-18
索引
論Springboot下如何進行包掃描
2020-10-14
Spring Boot
Vue 微信端掃描二維碼，蘋果端只能儲存圖片解決方法
2020-01-15
Vue蘋果

理解資料庫掃描方法-利用掃描方法對資料儲存進行優化

標籤

背景

全表掃描最好的情況優化

儲存優化

場景昇華 – 多表JOIN LIMIT優化

儲存優化

參考

相關文章