PostgreSQL大表掃描策略-BAS_BULKREAD,synchronize_seqscans

德哥發表於2018-05-06

SQL

背景

資料庫BUFFER是一個對物理IO對衝非常有效的元件，所以BUFFER的穩態對IO波動影響很大。

在一些操作時，可能會導致一些IO的波動，比如刷髒頁，檢查點。但是PostgreSQL在這方面都有很好的排程策略，避免IO的波動。具體見PG文件的介紹，或如下

《PostgreSQL 9.6 平滑fsync, write原理淺析》

還有一些操作也可能打破BUFFER的穩態，例如全表掃描訪問大表，不過還好PG也有對應的AM策略，當表大小超過了四分之一的shared_buffers時，並且對其進行全表掃描時，會使用大表訪問策略：

包括每個ring訪問多少個BLOCK，以及這些BUFFER的標記，是否優先清除。(從而避免大表全表掃描時，導致一些較熱的BUFFER被清出。而是優先清出全表掃描的大表BUFFER)

另一個大表全表掃訪問可能導致IO紊亂的原因，如果全表掃描大表的併發比較多，比如有5個併發，如果每個會話都需要掃描一遍全表，實際IO會是5乘以表的大小。

PostgreSQL對此也做了優化。

PostgreSQL 並行會話大表同步掃描，共享IO。

設定synchronize_seqscans=true時，如果有並行的會話全表掃描大表，那麼PG會使用共享IO，協調多個會話的掃描步調，使得並行的會話儘量的同時訪問同一個BLOCK，最後大家都掃描完整表。

這樣的話從磁碟讀取到SHARED BUFFER的IO可能就不是表大小乘以會話數的N備，而是更小，甚至1.

synchronize_seqscans (boolean)

This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then “wrap around” the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have no ORDER BY clause. Setting this parameter to off ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default is on.

如果將其設定為OFF，那麼全表掃描都是從0號資料塊開始掃。那麼全表掃描的記錄返回順序完全會與行號一致。

而設定為ON時，由於要協調並行訪問的步調，全表掃描可能從中間開始掃，所以返回資料的順序可能會亂序。

社群全表掃描的大表訪問策略

/* ----------------  
 *              initscan - scan code common to heap_beginscan and heap_rescan  
 * ----------------  
 */  
static void  
initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)  
{  
  
  
        /*  
         * If the table is large relative to NBuffers, use a bulk-read access  
         * strategy and enable synchronized scanning (see syncscan.c).  Although  
         * the thresholds for these features could be different, we make them the  
         * same so that there are only two behaviors to tune rather than four.  
         * (However, some callers need to be able to disable one or both of these  
         * behaviors, independently of the size of the table; also there is a GUC  
         * variable that can disable synchronized scanning.)  
         *  
         * Note that heap_parallelscan_initialize has a very similar test; if you  
         * change this, consider changing that one, too.  
         */  
        if (!RelationUsesLocalBuffers(scan->rs_rd) &&  
                scan->rs_nblocks > NBuffers / 4)  
        {  
                allow_strat = scan->rs_allow_strat;  
                allow_sync = scan->rs_allow_sync;  
        }  
        else  
                allow_strat = allow_sync = false;  
  
        if (allow_strat)  
        {  
                /* During a rescan, keep the previous strategy object. */  
                if (scan->rs_strategy == NULL)  
                        scan->rs_strategy = GetAccessStrategy(BAS_BULKREAD);  
        }  
  
....  
  
  
        else if (keep_startblock)  
        {  
                /*  
                 * When rescanning, we want to keep the previous startblock setting,  
                 * so that rewinding a cursor doesn`t generate surprising results.  
                 * Reset the active syncscan setting, though.  
                 */  
                scan->rs_syncscan = (allow_sync && synchronize_seqscans);  
        }  
        else if (allow_sync && synchronize_seqscans)  
        {  
                scan->rs_syncscan = true;  
                scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks);  
        }

NBuffers變數指的是SHARED BUFFER

        /*  
         * We sometimes multiply the number of shared buffers by two without  
         * checking for overflow, so we mustn`t allow more than INT_MAX / 2.  
         */  
        {  
                {"shared_buffers", PGC_POSTMASTER, RESOURCES_MEM,  
                        gettext_noop("Sets the number of shared memory buffers used by the server."),  
                        NULL,  
                        GUC_UNIT_BLOCKS  
                },  
                &NBuffers,  
                1024, 16, INT_MAX / 2,  
                NULL, NULL, NULL  
        },

目前PG設計的幾種訪問策略

/* Possible arguments for GetAccessStrategy() */  
typedef enum BufferAccessStrategyType  
{  
        BAS_NORMAL,                                     /* Normal random access */  
        BAS_BULKREAD,                           /* Large read-only scan (hint bit updates are  
                                                                 * ok) */  
        BAS_BULKWRITE,                          /* Large multi-block write (e.g. COPY IN) */  
        BAS_VACUUM                                      /* VACUUM */  
} BufferAccessStrategyType;

獲取訪問策略

/*  
 * GetAccessStrategy -- create a BufferAccessStrategy object  
 *  
 * The object is allocated in the current memory context.  
 */  
BufferAccessStrategy  
GetAccessStrategy(BufferAccessStrategyType btype)  
{  
        BufferAccessStrategy strategy;  
        int                     ring_size;  
  
        /*  
         * Select ring size to use.  See buffer/README for rationales.  
         *  
         * Note: if you change the ring size for BAS_BULKREAD, see also  
         * SYNC_SCAN_REPORT_INTERVAL in access/heap/syncscan.c.  
         */  
        switch (btype)  
        {  
                case BAS_NORMAL:  
                        /* if someone asks for NORMAL, just give `em a "default" object */  
                        return NULL;  
  
                case BAS_BULKREAD:  
                        ring_size = 256 * 1024 / BLCKSZ;  
                        break;  
                case BAS_BULKWRITE:  
                        ring_size = 16 * 1024 * 1024 / BLCKSZ;  
                        break;  
                case BAS_VACUUM:  
                        ring_size = 256 * 1024 / BLCKSZ;  
                        break;  
  
                default:  
                        elog(ERROR, "unrecognized buffer access strategy: %d",  
                                 (int) btype);  
                        return NULL;            /* keep compiler quiet */  
        }  
  
        /* Make sure ring isn`t an undue fraction of shared buffers */  
        ring_size = Min(NBuffers / 8, ring_size);  
  
        /* Allocate the object and initialize all elements to zeroes */  
        strategy = (BufferAccessStrategy)  
                palloc0(offsetof(BufferAccessStrategyData, buffers) +  
                                ring_size * sizeof(Buffer));  
  
        /* Set fields that don`t start out zero */  
        strategy->btype = btype;  
        strategy->ring_size = ring_size;  
  
        return strategy;  
}

檢查BAS_BULKREAD標記

/*  
 * StrategyRejectBuffer -- consider rejecting a dirty buffer  
 *  
 * When a nondefault strategy is used, the buffer manager calls this function  
 * when it turns out that the buffer selected by StrategyGetBuffer needs to  
 * be written out and doing so would require flushing WAL too.  This gives us  
 * a chance to choose a different victim.  
 *  
 * Returns true if buffer manager should ask for a new victim, and false  
 * if this buffer should be written and re-used.  
 */  
bool  
StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)  
{  
        /* We only do this in bulkread mode */  
        if (strategy->btype != BAS_BULKREAD)  
                return false;  
  
        /* Don`t muck with behavior of normal buffer-replacement strategy */  
        if (!strategy->current_was_in_ring ||  
                strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))  
                return false;  
  
        /*  
         * Remove the dirty buffer from the ring; necessary to prevent infinite  
         * loop if all ring members are dirty.  
         */  
        strategy->buffers[strategy->current] = InvalidBuffer;  
  
        return true;  
}

優先幹掉BAS_BULKREAD標記的BUFFER

                                        if (XLogNeedsFlush(lsn) &&  
                                                StrategyRejectBuffer(strategy, buf))  
                                        {  
                                                /* Drop lock/pin and loop around for another buffer */  
                                                LWLockRelease(BufferDescriptorGetContentLock(buf));  
                                                UnpinBuffer(buf, true);  
                                                continue;  
                                        }

參考

src/backend/access/heap/heapam.c

src/backend/access/heap/syncscan.c

src/include/storage/bufmgr.h

src/backend/storage/buffer/freelist.c

src/backend/utils/misc/guc.c

src/backend/storage/buffer/bufmgr.c

PostgreSQL掃描方法綜述
2019-09-22
SQL
全表掃描和全索引掃描
2023-02-16
索引
MySQL中的全表掃描和索引樹掃描
2022-05-14
MySql索引
PostgreSQL DBA(55) - MVCC#8(對全表掃描的影響)
2019-07-16
SQLMVCC#
PostgreSQL技術內幕（七）索引掃描
2023-04-21
SQL索引
全表掃描和全索引掃描繼續（PG-TiDB）
2023-02-20
索引TiDB
視覺Mamba的多向掃描策略真的有效嗎？
2024-05-18
視覺
AWVS掃描器掃描web漏洞操作
2018-08-06
Web
Redis大key掃描Python指令碼
2019-03-27
RedisPython指令碼
掃描器的存在、奧普掃描器
2020-02-06
win10系統掃描器提示掃描不到掃描器如何解決
2019-01-07
Win10
掃描器
2019-12-27
oracle是如何進行全表掃描的
2019-05-30
Oracle
掃描王 for Mac專業圖片掃描工具
2020-12-04
Mac
LightDB相容掃描 - 事前SQL相容遷移評估工具24.1支援MySQL --> TDSQL-PostgreSQL相容性掃描
2024-04-19
MySql
24_Oracle資料庫全表掃描詳解(四)_全表掃描生產最佳化案例三則
2020-03-14
Oracle資料庫
PostgreSQL模擬兩個update語句死鎖-利用掃描方法
2018-10-05
SQL
什麼是漏洞掃描?漏洞掃描功能有哪些?
2023-09-28
[20210219]全表掃描邏輯讀問題.txt
2021-02-20
python掃描埠
2018-06-14
Python
目錄掃描
2020-10-07
埠掃描器
2020-11-19
DAST 黑盒漏洞掃描器第四篇：掃描效能
2022-06-22
AST
電腦掃描檔案怎麼掃描 win10電腦掃描檔案方法介紹
2022-10-18
Win10
奧普快票通表票掃描識別系統
2019-12-16
SELECT COUNT(*) 會造成全表掃描？回去等通知吧
2023-01-15
京東掃描平臺EOS—JS掃描落地與實踐
2021-02-03
JS
[20210220]全索引掃描快速索引掃描的邏輯讀.txt
2021-02-20
索引
Zenmap(埠掃描工具)
2019-01-15
P2032 掃描
2024-03-10
direasch目錄掃描
2024-05-04
淺談掃描線
2023-11-11
sonar（二）掃描配置
2024-05-09
掃描行為分析
2024-09-19
綜合掃描工具
2024-09-10
Nydus 映象掃描加速
2023-01-09
索引掃描可能不如全表掃描的場景的理解__純粹資料量而言，不涉及CLUSTERING_FACTOR
2018-05-18
索引
關係型資料庫全表掃描分片詳解
2019-08-02
資料庫