PostgreSQL10.0preview效能增強-(多維分析)更快,更省記憶體hashedaggregationwithgroupingsets

德哥發表於2017-03-30

標籤

PostgreSQL , 10.0 , hashed aggregation with grouping sets


背景

grouping sets 是多維分析語法,PostgreSQL 從9.5開始支援這種語法,常被用於OLAP系統,資料透視等應用場景。

《PostgreSQL 9.5 new feature – Support GROUPING SETS, CUBE and ROLLUP.》

由於多維分析的一個QUERY涉及多個GROUP,所以如果使用hash agg的話,需要多個HASH table,平行計算. 9.5, 9.6的時候,還不支援一個QUERY使用多個HASH TABLE平行計算。

10.0 擴充套件了聚合NODE,支援hashAggregate並行開多個hashtable,以及MixedAggregate策略用於sort grouping時雜湊表的資料倒騰。

使用時對使用者完全透明,同時優化器在使用hash agg, multi hashtable,時,會盡量的減少重複SORT。

總而言之,grouping set多維分析會更快(即使包含排序),更省記憶體。

Support hashed aggregation with grouping sets.    
    
This extends the Aggregate node with two new features:     
HashAggregate can now run multiple hashtables concurrently,     
and a new strategy MixedAggregate populates hashtables while doing sorted grouping.    
    
The planner will now attempt to save as many sorts as possible when    
planning grouping sets queries, while not exceeding work_mem for the    
estimated combined sizes of all hashtables used.  No SQL-level changes    
are required.  There should be no user-visible impact other than the    
new EXPLAIN output and possible changes to result ordering when ORDER    
BY was not used (which affected a few regression tests).  The    
enable_hashagg option is respected.    
    
Author: Andrew Gierth    
Reviewers: Mark Dilger, Andres Freund    
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk    
    

例子

+explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)        
+  from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;        
+                                               QUERY PLAN                                                       
+--------------------------------------------------------------------------------------------------------        
+ Sort        
+   Sort Key: (GROUPING("*VALUES*".column1, "*VALUES*".column2)), "*VALUES*".column1, "*VALUES*".column2        
+   ->  HashAggregate        
+         Hash Key: "*VALUES*".column1        
+         Hash Key: "*VALUES*".column2        
+         ->  Values Scan on "*VALUES*"        
+(6 rows)       

這個patch的討論,詳見郵件組,本文末尾URL。

PostgreSQL社群的作風非常嚴謹,一個patch可能在郵件組中討論幾個月甚至幾年,根據大家的意見反覆的修正,patch合併到master已經非常成熟,所以PostgreSQL的穩定性也是遠近聞名的。

參考

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b5635948ab165b6070e7d05d111f966e07570d81


相關文章