理解資料庫掃描方法-利用掃描方法對資料儲存進行優化
標籤
PostgreSQL , 掃描方法 , 資料儲存
背景
假設一個黑盒中有三種水果:蘋果,香蕉、菠蘿。一共有若干個水果。
假設你需要拿10個蘋果,你需要拿多少次呢?
最差的情況,你可能需要把所有的水果都拿完。(全表掃描,掃到最後才拿到10個或者不足10個)
最好的情況,你可能10次就拿完。(全表掃描,掃10行全都是蘋果。)
PS:索引掃描這裡就不說了,因為要說的就是根據掃描方法來進行的優化。
全表掃描最好的情況優化
create table tbl (gid int, info text, crt_time timestamp);
insert into tbl select random()*10000 , `test`, now() from generate_series(1,10000000);
select * from tbl where gid=1 limit 10;
explain (analyze,verbose,timing,costs,buffers) select * from tbl where gid=1 limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1917.62 rows=10 width=17) (actual time=0.050..11.165 rows=10 loops=1)
Output: gid, info, crt_time
Buffers: shared hit=3 read=667 dirtied=354 written=340
-> Seq Scan on public.tbl (cost=0.00..188693.39 rows=984 width=17) (actual time=0.048..11.160 rows=10 loops=1)
Output: gid, info, crt_time
Filter: (tbl.gid = 1)
Rows Removed by Filter: 105132
Buffers: shared hit=3 read=667 dirtied=354 written=340
Planning time: 0.078 ms
Execution time: 11.184 ms
(10 rows)
儲存優化
postgres=# begin;
BEGIN
postgres=# create temp table tmp_tbl1 as select * from tbl where gid<>1 or gid is null;
SELECT 9998987
postgres=# delete from tbl where gid<>1;
DELETE 9998987
postgres=# end;
COMMIT
postgres=# vacuum full tbl;
VACUUM
postgres=# insert into tbl select * from tmp_tbl1 ;
INSERT 0 9998987
postgres=#
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where gid=1 limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1972.60 rows=10 width=17) (actual time=0.018..0.022 rows=10 loops=1)
Output: gid, info, crt_time
Buffers: shared read=1
-> Seq Scan on public.tbl (cost=0.00..178914.70 rows=907 width=17) (actual time=0.017..0.019 rows=10 loops=1)
Output: gid, info, crt_time
Filter: (tbl.gid = 1)
Buffers: shared read=1
Planning time: 0.129 ms
Execution time: 0.041 ms
(9 rows)
場景昇華 – 多表JOIN LIMIT優化
JOIN + LIMIT的場景:
通常有LIMIT的場景使用NESTLOOP JOIN效能可以比較好。
1、從外表開始掃
2、內表迴圈N次
儲存優化方法
1、外表,一開始掃描到的就是內表符合條件的資料
2、根據這種思路重新整理資料
3、檢視能耗
例子
create table a(id int, c1 int, c2 int, c3 int);
create table b(id int, c1 int, c2 int, c3 int);
insert into a select generate_series(1,10000000),1,1,1;
insert into b select random()*100, random()*100, random()*100, random()*100 from generate_series(1,10000000);
create index idx_a_1 on a(id,c1,c2,c3);
create index idx_b_1 on b(c1,c2);
vacuum analyze a;
vacuum analyze b;
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) limit 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.87..2669.74 rows=1000 width=32) (actual time=0.081..8.266 rows=991 loops=1)
Output: a.id, a.c1, a.c2, a.c3, b.id, b.c1, b.c2, b.c3
Buffers: shared hit=3984
-> Nested Loop (cost=0.87..2723.11 rows=1020 width=32) (actual time=0.080..7.996 rows=991 loops=1)
Output: a.id, a.c1, a.c2, a.c3, b.id, b.c1, b.c2, b.c3
Buffers: shared hit=3984
-> Index Scan using idx_b_1 on public.b (cost=0.43..1136.01 rows=1020 width=16) (actual time=0.053..2.569 rows=996 loops=1)
Output: b.id, b.c1, b.c2, b.c3
Index Cond: ((b.c1 = 1) AND (b.c2 = 1))
Buffers: shared hit=995
-> Index Only Scan using idx_a_1 on public.a (cost=0.43..1.55 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=996)
Output: a.id, a.c1, a.c2, a.c3
Index Cond: ((a.id = b.id) AND (a.c1 = 1) AND (a.c2 = 1) AND (a.c3 = 1))
Heap Fetches: 0
Buffers: shared hit=2989
Planning time: 0.603 ms
Execution time: 8.509 ms
(17 rows)
儲存優化
第一種可能,如果一次LOOP就可以返回1000條,那麼可以這樣優化
都使用SEQ SCAN
但是把複合條件的資料提到前面。
1、找到內表能滿足1000條以上的ID,資料提前。
2、找到與內表ID對應的資料,資料提前。
postgres=# select b.id,count(*) from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) group by 1 order by count(*) desc limit 10;
id | count
----+-------
26 | 18
68 | 18
52 | 16
94 | 16
35 | 16
80 | 15
77 | 15
96 | 15
73 | 15
74 | 15
(10 rows)
postgres=# create table b1 as select * from b where id in (select b.id from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) group by 1 order by count(*) desc limit 1000) and b.c1=1 and b.c2=1;
SELECT 991
postgres=# insert into b1 select * from b where not (id in (select b.id from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) group by 1 order by count(*) desc limit 1000) and b.c1=1 and b.c2=1)
postgres-# ;
INSERT 0 9999009
postgres=# alter table b rename to b2;
ALTER TABLE
postgres=# alter table b1 rename to b;
ALTER TABLE
外表只需要掃描6個資料塊。
(但是注意這個方法,如果總共資料不滿足1000條,那麼會導致外表全掃)
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a join b on (a.id=b.id and a.c1=1 and a.c2=1 and a.c3=1 and b.c1=1 and b.c2=1) limit 991;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..205423.04 rows=876 width=32) (actual time=0.071..7.845 rows=991 loops=1)
Output: a.id, a.c1, a.c2, a.c3, b.id, b.c1, b.c2, b.c3
Buffers: shared hit=2980
-> Nested Loop (cost=0.43..205423.04 rows=876 width=32) (actual time=0.069..7.577 rows=991 loops=1)
Output: a.id, a.c1, a.c2, a.c3, b.id, b.c1, b.c2, b.c3
Buffers: shared hit=2980
-> Seq Scan on public.b (cost=0.00..204057.62 rows=876 width=16) (actual time=0.019..0.384 rows=991 loops=1)
Output: b.id, b.c1, b.c2, b.c3
Filter: ((b.c1 = 1) AND (b.c2 = 1))
Buffers: shared hit=6
-> Index Only Scan using idx_a_1 on public.a (cost=0.43..1.55 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=991)
Output: a.c1, a.c2, a.c3, a.id
Index Cond: ((a.c1 = 1) AND (a.c2 = 1) AND (a.c3 = 1) AND (a.id = b.id))
Heap Fetches: 0
Buffers: shared hit=2974
Planning time: 0.513 ms
Execution time: 8.079 ms
(17 rows)
參考
《PostgreSQL OUTER JOIN 優化的幾個知識點 – 語義轉換、記憶體頻寬、JOIN演算法、FILTER親和力、TSP、HINT、命中率、儲存順序、掃描順序、索引深度》
相關文章
- 多執行緒掃描資料夾耗時方法分析執行緒
- PostgreSQL掃描方法綜述SQL
- 電腦掃描檔案怎麼掃描 win10電腦掃描檔案方法介紹Win10
- 【技術乾貨】Oracle資料庫漏洞掃描指南Oracle資料庫
- 24_Oracle資料庫全表掃描詳解(四)_全表掃描生產最佳化案例三則Oracle資料庫
- 對上次的自動掃描進行改造
- yii框架,掃描目錄下檔案入資料庫框架資料庫
- 關係型資料庫全表掃描分片詳解資料庫
- 23_Oracle資料庫全表掃描詳解(三)Oracle資料庫
- 22_Oracle資料庫全表掃描詳解(二)Oracle資料庫
- 21_Oracle資料庫全表掃描詳解(一)Oracle資料庫
- AWVS掃描器掃描web漏洞操作Web
- 全表掃描和全索引掃描索引
- 掃描行為分析
- Win10系統下掃描器程式無法掃描的解決方法Win10
- 掃描器的存在、奧普 掃描器
- linux online掃描共享儲存磁碟(無需reboot)Linuxboot
- 微軟自帶病毒怎麼掃描_win10掃描自帶病毒的方法微軟Win10
- Win10怎麼使用掃描器功能 win10使用掃描功能的方法Win10
- win10系統掃描器提示掃描不到掃描器如何解決Win10
- 掃描器
- PostgreSQL模擬兩個update語句死鎖-利用掃描方法SQL
- AppBoxFuture: 二級索引及索引掃描查詢資料APP索引
- 掃描王 for Mac專業圖片掃描工具Mac
- 在Linux中,什麼是埠掃描?如何使用工具如nmap進行埠掃描?Linux
- Harbor倉庫映象掃描原理
- 什麼是漏洞掃描?漏洞掃描功能有哪些?
- MySQL中的全表掃描和索引樹掃描MySql索引
- oracle是如何進行全表掃描的Oracle
- win10 安全中心關閉定期掃描方法 如何關閉win10自動掃描Win10
- BDA:Hadoop生態大資料工具的漏洞掃描器Hadoop大資料
- python掃描埠Python
- 目錄掃描
- 埠掃描器
- DAST 黑盒漏洞掃描器 第四篇:掃描效能AST
- 索引掃描可能不如全表掃描的場景的理解__純粹資料量而言,不涉及CLUSTERING_FACTOR索引
- 論Springboot下如何進行包掃描Spring Boot
- Vue 微信端掃描二維碼,蘋果端只能儲存圖片解決方法Vue蘋果