PostgreSQL大表掃描策略-BAS_BULKREAD,synchronize_seqscans

德哥發表於2018-05-06

標籤

PostgreSQL , 大表掃描 , synchronize_seqscans


背景

資料庫BUFFER是一個對物理IO對衝非常有效的元件,所以BUFFER的穩態對IO波動影響很大。

在一些操作時,可能會導致一些IO的波動,比如刷髒頁,檢查點。但是PostgreSQL在這方面都有很好的排程策略,避免IO的波動。具體見PG文件的介紹,或如下

《PostgreSQL 9.6 平滑fsync, write原理淺析》

還有一些操作也可能打破BUFFER的穩態,例如全表掃描訪問大表,不過還好PG也有對應的AM策略,當表大小超過了四分之一的shared_buffers時,並且對其進行全表掃描時,會使用大表訪問策略:

包括每個ring訪問多少個BLOCK,以及這些BUFFER的標記,是否優先清除。(從而避免大表全表掃描時,導致一些較熱的BUFFER被清出。而是優先清出全表掃描的大表BUFFER)

另一個大表全表掃訪問可能導致IO紊亂的原因,如果全表掃描大表的併發比較多,比如有5個併發,如果每個會話都需要掃描一遍全表,實際IO會是5乘以表的大小。

PostgreSQL對此也做了優化。

PostgreSQL 並行會話大表同步掃描,共享IO。

設定synchronize_seqscans=true時,如果有並行的會話全表掃描大表,那麼PG會使用共享IO,協調多個會話的掃描步調,使得並行的會話儘量的同時訪問同一個BLOCK,最後大家都掃描完整表。

這樣的話從磁碟讀取到SHARED BUFFER的IO可能就不是表大小 乘以 會話數的N備,而是更小,甚至1.

synchronize_seqscans (boolean)

This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then “wrap around” the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have no ORDER BY clause. Setting this parameter to off ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default is on.

如果將其設定為OFF,那麼全表掃描都是從0號資料塊開始掃。那麼全表掃描的記錄返回順序完全會與行號一致。

而設定為ON時,由於要協調並行訪問的步調,全表掃描可能從中間開始掃,所以返回資料的順序可能會亂序。

社群全表掃描的大表訪問策略

/* ----------------  
 *              initscan - scan code common to heap_beginscan and heap_rescan  
 * ----------------  
 */  
static void  
initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)  
{  
  
  
        /*  
         * If the table is large relative to NBuffers, use a bulk-read access  
         * strategy and enable synchronized scanning (see syncscan.c).  Although  
         * the thresholds for these features could be different, we make them the  
         * same so that there are only two behaviors to tune rather than four.  
         * (However, some callers need to be able to disable one or both of these  
         * behaviors, independently of the size of the table; also there is a GUC  
         * variable that can disable synchronized scanning.)  
         *  
         * Note that heap_parallelscan_initialize has a very similar test; if you  
         * change this, consider changing that one, too.  
         */  
        if (!RelationUsesLocalBuffers(scan->rs_rd) &&  
                scan->rs_nblocks > NBuffers / 4)  
        {  
                allow_strat = scan->rs_allow_strat;  
                allow_sync = scan->rs_allow_sync;  
        }  
        else  
                allow_strat = allow_sync = false;  
  
        if (allow_strat)  
        {  
                /* During a rescan, keep the previous strategy object. */  
                if (scan->rs_strategy == NULL)  
                        scan->rs_strategy = GetAccessStrategy(BAS_BULKREAD);  
        }  
  
....  
  
  
        else if (keep_startblock)  
        {  
                /*  
                 * When rescanning, we want to keep the previous startblock setting,  
                 * so that rewinding a cursor doesn`t generate surprising results.  
                 * Reset the active syncscan setting, though.  
                 */  
                scan->rs_syncscan = (allow_sync && synchronize_seqscans);  
        }  
        else if (allow_sync && synchronize_seqscans)  
        {  
                scan->rs_syncscan = true;  
                scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks);  
        }  

NBuffers變數指的是SHARED BUFFER

        /*  
         * We sometimes multiply the number of shared buffers by two without  
         * checking for overflow, so we mustn`t allow more than INT_MAX / 2.  
         */  
        {  
                {"shared_buffers", PGC_POSTMASTER, RESOURCES_MEM,  
                        gettext_noop("Sets the number of shared memory buffers used by the server."),  
                        NULL,  
                        GUC_UNIT_BLOCKS  
                },  
                &NBuffers,  
                1024, 16, INT_MAX / 2,  
                NULL, NULL, NULL  
        },  

目前PG設計的幾種訪問策略

/* Possible arguments for GetAccessStrategy() */  
typedef enum BufferAccessStrategyType  
{  
        BAS_NORMAL,                                     /* Normal random access */  
        BAS_BULKREAD,                           /* Large read-only scan (hint bit updates are  
                                                                 * ok) */  
        BAS_BULKWRITE,                          /* Large multi-block write (e.g. COPY IN) */  
        BAS_VACUUM                                      /* VACUUM */  
} BufferAccessStrategyType;  

獲取訪問策略

/*  
 * GetAccessStrategy -- create a BufferAccessStrategy object  
 *  
 * The object is allocated in the current memory context.  
 */  
BufferAccessStrategy  
GetAccessStrategy(BufferAccessStrategyType btype)  
{  
        BufferAccessStrategy strategy;  
        int                     ring_size;  
  
        /*  
         * Select ring size to use.  See buffer/README for rationales.  
         *  
         * Note: if you change the ring size for BAS_BULKREAD, see also  
         * SYNC_SCAN_REPORT_INTERVAL in access/heap/syncscan.c.  
         */  
        switch (btype)  
        {  
                case BAS_NORMAL:  
                        /* if someone asks for NORMAL, just give `em a "default" object */  
                        return NULL;  
  
                case BAS_BULKREAD:  
                        ring_size = 256 * 1024 / BLCKSZ;  
                        break;  
                case BAS_BULKWRITE:  
                        ring_size = 16 * 1024 * 1024 / BLCKSZ;  
                        break;  
                case BAS_VACUUM:  
                        ring_size = 256 * 1024 / BLCKSZ;  
                        break;  
  
                default:  
                        elog(ERROR, "unrecognized buffer access strategy: %d",  
                                 (int) btype);  
                        return NULL;            /* keep compiler quiet */  
        }  
  
        /* Make sure ring isn`t an undue fraction of shared buffers */  
        ring_size = Min(NBuffers / 8, ring_size);  
  
        /* Allocate the object and initialize all elements to zeroes */  
        strategy = (BufferAccessStrategy)  
                palloc0(offsetof(BufferAccessStrategyData, buffers) +  
                                ring_size * sizeof(Buffer));  
  
        /* Set fields that don`t start out zero */  
        strategy->btype = btype;  
        strategy->ring_size = ring_size;  
  
        return strategy;  
}  

檢查BAS_BULKREAD標記

/*  
 * StrategyRejectBuffer -- consider rejecting a dirty buffer  
 *  
 * When a nondefault strategy is used, the buffer manager calls this function  
 * when it turns out that the buffer selected by StrategyGetBuffer needs to  
 * be written out and doing so would require flushing WAL too.  This gives us  
 * a chance to choose a different victim.  
 *  
 * Returns true if buffer manager should ask for a new victim, and false  
 * if this buffer should be written and re-used.  
 */  
bool  
StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)  
{  
        /* We only do this in bulkread mode */  
        if (strategy->btype != BAS_BULKREAD)  
                return false;  
  
        /* Don`t muck with behavior of normal buffer-replacement strategy */  
        if (!strategy->current_was_in_ring ||  
                strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))  
                return false;  
  
        /*  
         * Remove the dirty buffer from the ring; necessary to prevent infinite  
         * loop if all ring members are dirty.  
         */  
        strategy->buffers[strategy->current] = InvalidBuffer;  
  
        return true;  
}  

優先幹掉BAS_BULKREAD標記的BUFFER

                                        if (XLogNeedsFlush(lsn) &&  
                                                StrategyRejectBuffer(strategy, buf))  
                                        {  
                                                /* Drop lock/pin and loop around for another buffer */  
                                                LWLockRelease(BufferDescriptorGetContentLock(buf));  
                                                UnpinBuffer(buf, true);  
                                                continue;  
                                        }  

參考

src/backend/access/heap/heapam.c

src/backend/access/heap/syncscan.c

src/include/storage/bufmgr.h

src/backend/storage/buffer/freelist.c

src/backend/utils/misc/guc.c

src/backend/storage/buffer/bufmgr.c


相關文章