通過10104閱讀hash join工作機制 ---OnePass

yezhibin發表於2009-12-23
三、Onepass hash join

1、記憶體hash表總體資訊

kxhfWrite: hash-join is spilling to disk
-- 發生了dump disk的動作


*** RowSrcId: 1 HASH JOIN BUILD HASH TABLE (PHASE 1) ***


Total number of partitions: 8
--hash表分為8個分割槽

Number of partitions which could fit in memory: 1
--適合存放在記憶體中分割槽數量為1

Number of partitions left in memory: 1
--留在記憶體分割槽數量為1

Total number of slots in in-memory partitions: 4
--在記憶體中分割槽的slot數量為4

Total number of rows in in-memory partitions: 222
--記憶體分割槽中存放小表的行數為222.

Estimated max # of build rows that can fit in avail memory: 1872
--評估所分配記憶體,最大能存放小表的行數位1872,測試中,小表的行數為2000行,所以可用預估,需要onepass hash join

2、分割槽表具體存放行數的資訊

### Partition Distribution ###
Partition:0    rows:249        clusters:4      slots:1      kept=0
Partition:1    rows:272        clusters:4      slots:1      kept=0
Partition:2    rows:243        clusters:4      slots:1      kept=0
Partition:3    rows:269        clusters:4      slots:1      kept=0
Partition:4    rows:251        clusters:4      slots:1      kept=0
Partition:5    rows:249        clusters:4      slots:1      kept=0
Partition:6    rows:222        clusters:4      slots:4      kept=1
Partition:7    rows:245        clusters:4      slots:2      kept=0
--存放2000行資料小表分佈在8個分割槽,目前partition 6儲存在記憶體中(kept=1),其他的存放在磁碟中。

Number of buckets: 512
--hash 表有512個hash bucket管理

3、hash表的具體存放資訊:


352+114+36+5+4+1=512
### Hash table ###
# NOTE: The calculated number of rows in non-empty buckets may be smaller
#       than the true number.
Number of buckets with   0 rows:        352
Number of buckets with   1 rows:        114
Number of buckets with   2 rows:         36
Number of buckets with   3 rows:          5
Number of buckets with   4 rows:          4
Number of buckets with   5 rows:          1
Number of buckets with   6 rows:          0
Number of buckets with   7 rows:          0
Number of buckets with   8 rows:          0
Number of buckets with   9 rows:          0
Number of buckets with between  10 and  19 rows:          0
Number of buckets with between  20 and  29 rows:          0
Number of buckets with between  30 and  39 rows:          0
Number of buckets with between  40 and  49 rows:          0
Number of buckets with between  50 and  59 rows:          0
Number of buckets with between  60 and  69 rows:          0
Number of buckets with between  70 and  79 rows:          0
Number of buckets with between  80 and  89 rows:          0
Number of buckets with between  90 and  99 rows:          0
Number of buckets with 100 or more rows:          0

對應的行數量=114 + 26*2 + 5*3 +4*4 + 1*5=202,我們發現計算的行數,正如所註釋一樣,少於實際行數,但不清楚為什麼存在此現象?

4、partition 6的hash 表統計資訊
Total buckets: 512
Empty buckets: 352
Non-empty buckets: 160
Total number of rows: 222
Maximum number of rows in a bucket: 5
Average number of rows in non-empty buckets: 1.387500

5、大表對應小表partition 6進行過濾
Used bitmap filtering: filtered rows=23 minimum required=50 out f=1000
Used bitmap filtering: filtered rows=237 minimum required=50 out f=1000
Used bitmap filtering: filtered rows=448 minimum required=50 out f=1000
..............................

以上第一階段與optimal hash join機制基本相同。以下是onepass處理的第二階段

1、從磁碟中提取分割槽,進行hash join資訊:

*** RowSrcId: 1 HASH JOIN GET FLUSHED PARTITIONS (PHASE 2) ***
Getting a pair of flushed partions.
 BUILD PARTION: nrows:249 size=(4 slots, 288K)
 PROBE PARTION: nrows:210 size=(3 slots, 216K)
   ROLE REVERSAL OCCURRED

--從磁碟提取的另一個分割槽表,行數為249,我們可以從上面分割槽分配情況得知其為partition 0,對應進行處理的大表行數為210行。我們從早先的跟蹤檔案資訊知道multiblock IO=9,這就意味著4 slots對應著36個blocks。

2、hash join處理重要資訊

Number of blocks that may be used to build the hash hable 72
Number of rows left to be iterated over (start of function): 210
Number of rows iterated over this function call: 210
Number of rows left to be iterated over (end of function): 0

-- 最後一行為0,表示此次處理只需要一次onepass就能完成,如果非0,意味著需要multi-pass處理。

3、partition 0 hash表資訊
Number of buckets with   0 rows:        346
Number of buckets with   1 rows:        132
Number of buckets with   2 rows:         24
Number of buckets with   3 rows:         10
Number of buckets with   4 rows:          0
Number of buckets with   5 rows:          0
Number of buckets with   6 rows:          0
Number of buckets with   7 rows:          0
Number of buckets with   8 rows:          0
Number of buckets with   9 rows:          0
Number of buckets with between  10 and  19 rows:          0
Number of buckets with between  20 and  29 rows:          0
Number of buckets with between  30 and  39 rows:          0
Number of buckets with between  40 and  49 rows:          0
Number of buckets with between  50 and  59 rows:          0
Number of buckets with between  60 and  69 rows:          0
Number of buckets with between  70 and  79 rows:          0
Number of buckets with between  80 and  89 rows:          0
Number of buckets with between  90 and  99 rows:          0
Number of buckets with 100 or more rows:          0
### Hash table overall statistics ###
Total buckets: 512 Empty buckets: 346 Non-empty buckets: 166
Total number of rows: 210
Maximum number of rows in a bucket: 3
Average number of rows in non-empty buckets: 1.265060

其他分割槽處理與以上類似。

OnePass工作機制:
1、hash 記憶體區域中分配了8個分割槽,每個分割槽對應4K bitmap點陣圖,小錶行存放到hash表中,因為記憶體限制,在記憶體中只保留一個分割槽,其餘分割槽dump到磁碟中。其處理機制與optimal相同。

2、大表按照小表分割槽表進行整理分類,當從磁碟中提取其他分割槽時候,對應提取整理過大表分割槽。獲取所需要的行

3、從跟蹤檔案中,小表處理過程先讀取資料到磁碟,然後將部分資料dump到磁碟,然後再從磁碟讀取到記憶體中,而大表處理過程,也是先從磁碟讀到記憶體,然後dump到磁碟,再從磁碟讀取。對磁碟的寫和讀為1次。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/354732/viewspace-623427/,如需轉載,請註明出處,否則將追究法律責任。

相關文章