通過10104閱讀hash join工作機制 ---Multipass

yezhibin發表於2009-12-23
四、Multipass hash join

1、記憶體hash表資訊

Hash-join fanout: 8
Number of partitions: 8
Number of slots: 14
Multiblock IO: 1
Block size(KB): 8
Cluster (slot) size(KB): 8
Minimum number of bytes per block: 8160
Bit vector memory allocation(KB): 0
Per partition bit vector length(KB): 0
Maximum possible row length: 1077
Estimated build size (KB): 2046
Estimated Build Row Length (includes overhead): 1048
.....................
kxhfWrite: hash-join is spilling to disk

*** RowSrcId: 1 HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Total number of partitions: 8
Number of partitions which could fit in memory: 0
--8個分割槽中沒有一個分割槽能全部放到記憶體中

Number of partitions left in memory: 0
--沒有一個分割槽能整個留在記憶體中

Total number of slots in in-memory partitions: 0
Total number of rows in in-memory partitions: 0
   (used as preliminary number of buckets in hash table)
Estimated max # of build rows that can fit in avail memory: 224

--從以上的初始化資訊,我們就很容易判斷hash join機制是optimal、onepass還是multipass。

2、分割槽表資訊
### Partition Distribution ###
Partition:0    rows:262        clusters:34     slots:1      kept=0
Partition:1    rows:269        clusters:35     slots:1      kept=0
Partition:2    rows:245        clusters:32     slots:1      kept=0
Partition:3    rows:231        clusters:30     slots:1      kept=0
Partition:4    rows:244        clusters:32     slots:1      kept=0
Partition:5    rows:254        clusters:33     slots:1      kept=0
Partition:6    rows:254        clusters:33     slots:1      kept=0
Partition:7    rows:241        clusters:31     slots:6      kept=0
--在multpass 中所有kept=0,說明每個分割槽都沒有資料完全能夠存放在hash 記憶體中。

3、第二階段處理過程

因為分割槽表無法單獨存放在記憶體中,所以需要將該分割槽再次分割成幾個子分割槽,以下就是各個子分割槽的處理跟蹤資訊:

分割槽1處理過程:

*** RowSrcId: 1 HASH JOIN GET FLUSHED PARTITIONS (PHASE 2) ***
Getting a pair of flushed partions.
 BUILD PARTION: nrows:231 size=(30 slots, 240K)
 PROBE PARTION: nrows:246 size=(32 slots, 256K)
*** RowSrcId: 1 HASH JOIN BUILD HASH TABLE (PHASE 2) ***

Number of blocks that may be used to build the hash table 9

--從資訊中可以看出,記憶體hash表中9個blocks來處理30blocks ( 30slots = 30 blocks)  資料,所以該記憶體需要分四次才能處理完成(round(30/9)=4),從側面上可以推斷至少需要提高4倍的hash area size才能實現onepass功能。

Number of rows left to be iterated over (start of function): 231

--231行處理中,從第一行進行處理

Number of rows iterated over this function call: 70
Number of rows left to be iterated over (end of function): 161
--還有161行未處理完成,需要繼續進行處理

### Hash table ###
# NOTE: The calculated number of rows in non-empty buckets may be smaller
#       than the true number.
Number of buckets with   0 rows:        448
Number of buckets with   1 rows:         58
Number of buckets with   2 rows:          6
Number of buckets with   3 rows:          0
Number of buckets with   4 rows:          0
Number of buckets with   5 rows:          0
Number of buckets with   6 rows:          0
Number of buckets with   7 rows:          0
Number of buckets with   8 rows:          0
Number of buckets with   9 rows:          0
Number of buckets with between  10 and  19 rows:          0
Number of buckets with between  20 and  29 rows:          0
Number of buckets with between  30 and  39 rows:          0
Number of buckets with between  40 and  49 rows:          0
Number of buckets with between  50 and  59 rows:          0
Number of buckets with between  60 and  69 rows:          0
Number of buckets with between  70 and  79 rows:          0
Number of buckets with between  80 and  89 rows:          0
Number of buckets with between  90 and  99 rows:         0
Number of buckets with 100 or more rows:          0
### Hash table overall statistics ###
Total buckets: 512 Empty buckets: 448 Non-empty buckets: 64
Total number of rows: 70
Maximum number of rows in a bucket: 2
Average number of rows in non-empty buckets: 1.093750
kxhfResetIter(1105ed460)
qerhjFetchPhase2(): building a hash table
*** RowSrcId: 1 HASH JOIN BUILD HASH TABLE (PHASE 2) ***
Number of blocks that may be used to build the hash hable 9
Number of rows left to be iterated over (start of function): 161
Number of rows iterated over this function call: 71
Number of rows left to be iterated over (end of function): 90

以下省略。

以上實驗Multipass工作機制

1、hash 記憶體中分配8個分割槽存放小表資料,但所提供的記憶體不足以支撐任何一個分割槽進行hash join,所以一個分割槽再分成多個子分割槽,然後分別從磁碟提取到hash記憶體與大表對應的分割槽表進行匹配,匹配的原理與前面所述一致。

2、小表的分割槽表總的被掃描匹配一次,但大表的分割槽表,每次都要與小表進行掃描匹配一次,這就意味著,如果一個分割槽表被分成四個子分割槽表,則對應的大表分割槽表,將被掃描匹配4次。所以大表的multipass=4。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/354732/viewspace-623433/,如需轉載,請註明出處,否則將追究法律責任。

相關文章