PostgreSQL大表掃描策略-BAS_BULKREAD,synchronize_seqscans
標籤
PostgreSQL , 大表掃描 , synchronize_seqscans
背景
資料庫BUFFER是一個對物理IO對衝非常有效的元件,所以BUFFER的穩態對IO波動影響很大。
在一些操作時,可能會導致一些IO的波動,比如刷髒頁,檢查點。但是PostgreSQL在這方面都有很好的排程策略,避免IO的波動。具體見PG文件的介紹,或如下
《PostgreSQL 9.6 平滑fsync, write原理淺析》
還有一些操作也可能打破BUFFER的穩態,例如全表掃描訪問大表,不過還好PG也有對應的AM策略,當表大小超過了四分之一的shared_buffers時,並且對其進行全表掃描時,會使用大表訪問策略:
包括每個ring訪問多少個BLOCK,以及這些BUFFER的標記,是否優先清除。(從而避免大表全表掃描時,導致一些較熱的BUFFER被清出。而是優先清出全表掃描的大表BUFFER)
另一個大表全表掃訪問可能導致IO紊亂的原因,如果全表掃描大表的併發比較多,比如有5個併發,如果每個會話都需要掃描一遍全表,實際IO會是5乘以表的大小。
PostgreSQL對此也做了優化。
PostgreSQL 並行會話大表同步掃描,共享IO。
設定synchronize_seqscans=true時,如果有並行的會話全表掃描大表,那麼PG會使用共享IO,協調多個會話的掃描步調,使得並行的會話儘量的同時訪問同一個BLOCK,最後大家都掃描完整表。
這樣的話從磁碟讀取到SHARED BUFFER的IO可能就不是表大小 乘以 會話數的N備,而是更小,甚至1.
synchronize_seqscans (boolean)
This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then “wrap around” the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have no ORDER BY clause. Setting this parameter to off ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default is on.
如果將其設定為OFF,那麼全表掃描都是從0號資料塊開始掃。那麼全表掃描的記錄返回順序完全會與行號一致。
而設定為ON時,由於要協調並行訪問的步調,全表掃描可能從中間開始掃,所以返回資料的順序可能會亂序。
社群全表掃描的大表訪問策略
/* ----------------
* initscan - scan code common to heap_beginscan and heap_rescan
* ----------------
*/
static void
initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
{
/*
* If the table is large relative to NBuffers, use a bulk-read access
* strategy and enable synchronized scanning (see syncscan.c). Although
* the thresholds for these features could be different, we make them the
* same so that there are only two behaviors to tune rather than four.
* (However, some callers need to be able to disable one or both of these
* behaviors, independently of the size of the table; also there is a GUC
* variable that can disable synchronized scanning.)
*
* Note that heap_parallelscan_initialize has a very similar test; if you
* change this, consider changing that one, too.
*/
if (!RelationUsesLocalBuffers(scan->rs_rd) &&
scan->rs_nblocks > NBuffers / 4)
{
allow_strat = scan->rs_allow_strat;
allow_sync = scan->rs_allow_sync;
}
else
allow_strat = allow_sync = false;
if (allow_strat)
{
/* During a rescan, keep the previous strategy object. */
if (scan->rs_strategy == NULL)
scan->rs_strategy = GetAccessStrategy(BAS_BULKREAD);
}
....
else if (keep_startblock)
{
/*
* When rescanning, we want to keep the previous startblock setting,
* so that rewinding a cursor doesn`t generate surprising results.
* Reset the active syncscan setting, though.
*/
scan->rs_syncscan = (allow_sync && synchronize_seqscans);
}
else if (allow_sync && synchronize_seqscans)
{
scan->rs_syncscan = true;
scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks);
}
NBuffers變數指的是SHARED BUFFER
/*
* We sometimes multiply the number of shared buffers by two without
* checking for overflow, so we mustn`t allow more than INT_MAX / 2.
*/
{
{"shared_buffers", PGC_POSTMASTER, RESOURCES_MEM,
gettext_noop("Sets the number of shared memory buffers used by the server."),
NULL,
GUC_UNIT_BLOCKS
},
&NBuffers,
1024, 16, INT_MAX / 2,
NULL, NULL, NULL
},
目前PG設計的幾種訪問策略
/* Possible arguments for GetAccessStrategy() */
typedef enum BufferAccessStrategyType
{
BAS_NORMAL, /* Normal random access */
BAS_BULKREAD, /* Large read-only scan (hint bit updates are
* ok) */
BAS_BULKWRITE, /* Large multi-block write (e.g. COPY IN) */
BAS_VACUUM /* VACUUM */
} BufferAccessStrategyType;
獲取訪問策略
/*
* GetAccessStrategy -- create a BufferAccessStrategy object
*
* The object is allocated in the current memory context.
*/
BufferAccessStrategy
GetAccessStrategy(BufferAccessStrategyType btype)
{
BufferAccessStrategy strategy;
int ring_size;
/*
* Select ring size to use. See buffer/README for rationales.
*
* Note: if you change the ring size for BAS_BULKREAD, see also
* SYNC_SCAN_REPORT_INTERVAL in access/heap/syncscan.c.
*/
switch (btype)
{
case BAS_NORMAL:
/* if someone asks for NORMAL, just give `em a "default" object */
return NULL;
case BAS_BULKREAD:
ring_size = 256 * 1024 / BLCKSZ;
break;
case BAS_BULKWRITE:
ring_size = 16 * 1024 * 1024 / BLCKSZ;
break;
case BAS_VACUUM:
ring_size = 256 * 1024 / BLCKSZ;
break;
default:
elog(ERROR, "unrecognized buffer access strategy: %d",
(int) btype);
return NULL; /* keep compiler quiet */
}
/* Make sure ring isn`t an undue fraction of shared buffers */
ring_size = Min(NBuffers / 8, ring_size);
/* Allocate the object and initialize all elements to zeroes */
strategy = (BufferAccessStrategy)
palloc0(offsetof(BufferAccessStrategyData, buffers) +
ring_size * sizeof(Buffer));
/* Set fields that don`t start out zero */
strategy->btype = btype;
strategy->ring_size = ring_size;
return strategy;
}
檢查BAS_BULKREAD標記
/*
* StrategyRejectBuffer -- consider rejecting a dirty buffer
*
* When a nondefault strategy is used, the buffer manager calls this function
* when it turns out that the buffer selected by StrategyGetBuffer needs to
* be written out and doing so would require flushing WAL too. This gives us
* a chance to choose a different victim.
*
* Returns true if buffer manager should ask for a new victim, and false
* if this buffer should be written and re-used.
*/
bool
StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
{
/* We only do this in bulkread mode */
if (strategy->btype != BAS_BULKREAD)
return false;
/* Don`t muck with behavior of normal buffer-replacement strategy */
if (!strategy->current_was_in_ring ||
strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
return false;
/*
* Remove the dirty buffer from the ring; necessary to prevent infinite
* loop if all ring members are dirty.
*/
strategy->buffers[strategy->current] = InvalidBuffer;
return true;
}
優先幹掉BAS_BULKREAD標記的BUFFER
if (XLogNeedsFlush(lsn) &&
StrategyRejectBuffer(strategy, buf))
{
/* Drop lock/pin and loop around for another buffer */
LWLockRelease(BufferDescriptorGetContentLock(buf));
UnpinBuffer(buf, true);
continue;
}
參考
src/backend/access/heap/heapam.c
src/backend/access/heap/syncscan.c
src/include/storage/bufmgr.h
src/backend/storage/buffer/freelist.c
src/backend/utils/misc/guc.c
src/backend/storage/buffer/bufmgr.c
相關文章
- PostgreSQL掃描方法綜述SQL
- MySQL中的全表掃描和索引樹掃描MySql索引
- PostgreSQL DBA(55) - MVCC#8(對全表掃描的影響)SQLMVCC#
- PostgreSQL技術內幕(七)索引掃描SQL索引
- 優化全表掃描優化
- delete 與全表掃描delete
- Oracle中存取資料掃描Table及索引的方式(全表掃描,索引掃描等)Oracle索引
- 使用全表掃描快取大表的相關問題快取
- oracle 全表掃描,索引範圍掃描與塊的理解Oracle索引
- (轉)索引掃描還是全表掃描(Index Scan Or Full Table Scan)索引Index
- 轉)索引掃描還是全表掃描(Index Scan Or Full Table Scan)索引Index
- ORACLE全表掃描查詢Oracle
- 查詢全表掃描語句
- oracle優化:避免全表掃描Oracle優化
- 查詢全表掃描的sqlSQL
- 掃描技術和掃描工具
- 使用索引快速全掃描(Index FFS)避免全表掃描的若干場景索引Index
- 視覺Mamba的多向掃描策略真的有效嗎?視覺
- delete 刪除資料 全表掃描還是掃描所有塊的測試delete
- 抓取全表掃描的表,篩選和分析
- 大表範圍掃描走SORT MERGE JOIN的SQL優化SQL優化
- AWVS掃描器掃描web漏洞操作Web
- Redis大key掃描Python指令碼RedisPython指令碼
- win10系統掃描器提示掃描不到掃描器如何解決Win10
- 掃描王 for Mac專業圖片掃描工具Mac
- Nmap掃描教程之基礎掃描詳解
- oracle是如何進行全表掃描的Oracle
- 設法減少表掃描次數(luise)UI
- 優化Oracle with全表掃描的問題優化Oracle
- 24_Oracle資料庫全表掃描詳解(四)_全表掃描生產最佳化案例三則Oracle資料庫
- PostgreSQL模擬兩個update語句死鎖-利用掃描方法SQL
- LightDB相容掃描 - 事前SQL相容遷移評估工具24.1支援MySQL --> TDSQL-PostgreSQL相容性掃描MySql
- 埠掃描器
- 安全掃描工具
- 綜合掃描工具
- 全表掃描的cost 與 索引掃描Cost的比較 – 無直方圖(10.1.0.3以後)索引直方圖
- oracle實驗記錄(分割槽全表掃描(全區掃描) FTS 時候的成本計算)Oracle
- 隱形轉換導致全表掃描案例