震精-PostgreSQLdecimal64decimal128高效率數值型別擴充套件

德哥發表於2017-03-24

背景

PostgreSQL內建的數值型別包括

整型、浮點、整型序列、”無限”精度數值

Name	Storage Size	Description	Range
smallint	2 bytes	small-range integer	-32768 to +32767
integer	4 bytes	typical choice for integer	-2147483648 to +2147483647
bigint	8 bytes	large-range integer	-9223372036854775808 to +9223372036854775807
decimal	variable	user-specified precision, exact	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric	variable	user-specified precision, exact	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real	4 bytes	variable-precision, inexact	6 decimal digits precision
double precision	8 bytes	variable-precision, inexact	15 decimal digits precision
smallserial	2 bytes	small autoincrementing integer	1 to 32767
serial	4 bytes	autoincrementing integer	1 to 2147483647
bigserial	8 bytes	large autoincrementing integer	1 to 9223372036854775807

其中除了 “無限”精度數值型別。他型別都是定長儲存，使用時不需要呼叫palloc，效率較高。

如果你要使用超過雙精能表示的有效範圍的數值，目前只能選擇decimal
umeric型別，而這個型別前面說了，由於是變長設計，需要呼叫palloc，效率一般。

那麼在資料分析領域，或者需要處理非常多的資料記錄時，numeric型別的開銷是較大的。

PostgreSQL社群有一些擴充套件，可以解決這個問題，

1. 比如2nd的fixeddecimal外掛，使用INT8來表示NUMERIC，精度可調。

《PostgreSQL fixeddecimal – 用CPU “硬解碼” 提升1倍數值運算能力助力金融大資料量計算》

2. 比如社群的pgdecimal外掛，支援decimal32和decimal64兩種型別。

https://pgxn.org/dist/pgdecimal/1.0.0/

3. 比如vitesse的pgdecimal外掛，也就是本文將提到的外掛，支援decimal64與decimal128型別，精度基本上足夠使用。

推薦使用vitesse提供的pgdecimal外掛，因為它效率夠高，精度夠大。

pgdecimal外掛介紹

有兩個常見的decimal庫，decNumber以及Intel提供的Intel ADX庫。

pgdecimal外掛選擇了decNumber庫，因為GCC也在用它（法律風險更小？）

https://github.com/gcc-mirror/gcc/tree/master/libdecnumber

decimal庫的效能對比

http://speleotrove.com/decimal/dpintro.html

decNumber與Inter ADX效能接近，但是Inter ADX提供了decimal64/128, int32/64, float/double型別的相互轉換，這個很給力。（也許將來vitesse會支援intel adx庫吧）

pgdecimal 依賴的decNumber，因此我們必須先安裝decNumber

decNumber安裝

1. 下載 decNumber package

http://speleotrove.com/decimal/

wget http://speleotrove.com/decimal/decNumber-icu-368.zip  
unzip decNumber-icu-368.zip

或者從本站連結下載

2. 安裝decNumber到postgresql軟體目錄中(假設postgresql安裝在/home/digoal/pgsql9.6)

首先要在postgresql軟體的include目錄中，建立一個空目錄，

mkdir -p /home/digoal/pgsql9.6/include/decnumber

在decNumber src目錄中建立Makefile，install -D 修改為對應要安裝的目錄。

cd decNumber  
  
vi Makefile  
  
OBJS = decSingle.o decDouble.o decQuad.o decNumber.o decContext.o  
  
CFLAGS = -Wall -g -O2 -fPIC  
  
libdecnumber.a: $(OBJS)  
        ar -rcs libdecnumber.a $(OBJS)  
  
clean:  
        rm -f libdecnumber.a $(OBJS)  
  
install:  
        install -D *.h /home/digoal/pgsql9.6/include/decnumber  
        install -D libdecnumber.a /home/digoal/pgsql9.6/lib

3. 編譯安裝decNumber

cd decNumber  
  
make   
make install

4. decNumber的C庫reference如下, pgdecimal外掛中用到的decnumber庫，需要了解細節的話請參考:

The decNumber C library

pgdecimal安裝

git clone https://github.com/vitesse-ftian/pgdecimal

或者從本站連結下載

cd pgdecimal

有一個小BUG，.control的版本號沒有與sql檔案的版本號對齊

mv decimal--2.0.sql decimal--1.0.sql

另外，需要修改一下Makefile，指定版本，以及decnumber的include和lib目錄

vi Makefile  
  
PG_CPPFLAGS = -I/home/digoal/pgsql9.6/include/decnumber  
SHLIB_LINK = -L/home/digoal/pgsql9.6/lib -ldecnumber  
  
DATA = decimal--1.0.sql

安裝

export PATH=/home/digoal/pgsql9.6/bin:$PATH  
USE_PGXS=1 make clean  
USE_PGXS=1 make   
USE_PGXS=1 make install  
  
  
/bin/mkdir -p `/home/digoal/pgsql9.6/lib`  
/bin/mkdir -p `/home/digoal/pgsql9.6/share/extension`  
/bin/mkdir -p `/home/digoal/pgsql9.6/share/extension`  
/usr/bin/install -c -m 755  decimal.so `/home/digoal/pgsql9.6/lib/decimal.so`  
/usr/bin/install -c -m 644 .//decimal.control `/home/digoal/pgsql9.6/share/extension/`  
/usr/bin/install -c -m 644 .//decimal--1.0.sql  `/home/digoal/pgsql9.6/share/extension/`

使用

psql  
  
postgres=# create extension decimal;  
CREATE EXTENSION

pgdecimal效能對比

使用int8, float8, decimal64, decimal128, numeric(15,3) 幾種型別，分別比較這幾種型別的效能。

create table tt(ii bigint, d double precision, d64 decimal64, d128 decimal128, n numeric(15, 3));  
  
postgres=# 	iming  
Timing is on.  
  
生成測試資料  
  
postgres=# insert into tt select i, i + 0.123, i + 0.123::decimal64, i + 0.123::decimal128, i + 0.123 from generate_series(1, 1000000) i;  
INSERT 0 1000000  
Time: 2125.723 ms  
  
postgres=# select * from tt limit 2;  
 ii |   d   |  d64  | d128  |   n     
----+-------+-------+-------+-------  
  1 | 1.123 | 1.123 | 1.123 | 1.123  
  2 | 2.123 | 2.123 | 2.123 | 2.123  
(2 rows)

普通查詢效能對比

postgres=# select count(*) from tt where (d + d*d + d*d*d + d*d*d*d) > 10000000;  
 count    
--------  
 999945  
(1 row)  
  
Time: 411.418 ms  
postgres=# select count(*) from tt where (n + n*n + n*n*n + n*n*n*n) > 10000000;  
 count    
--------  
 999945  
(1 row)  
  
Time: 1949.367 ms  
postgres=# select count(*) from tt where (d64 + d64*d64 + d64*d64*d64 + d64*d64*d64*d64) > 10000000;  
 count    
--------  
 999945  
(1 row)  
  
Time: 1165.304 ms  
postgres=# select count(*) from tt where (d128 + d128*d128 + d128*d128*d128 + d128*d128*d128*d128) > 10000000;  
 count    
--------  
 999945  
(1 row)  
  
Time: 1517.179 ms

排序效能對比

postgres=# select * from tt order by d limit 2 offset 999000;  
   ii   |     d      |    d64     |    d128    |     n        
--------+------------+------------+------------+------------  
 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123  
 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123  
(2 rows)  
  
Time: 804.645 ms  
postgres=# select * from tt order by n limit 2 offset 999000;  
   ii   |     d      |    d64     |    d128    |     n        
--------+------------+------------+------------+------------  
 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123  
 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123  
(2 rows)  
  
Time: 2828.066 ms  
postgres=# select * from tt order by d64 limit 2 offset 999000;  
   ii   |     d      |    d64     |    d128    |     n        
--------+------------+------------+------------+------------  
 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123  
 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123  
(2 rows)  
  
Time: 1826.044 ms  
postgres=# select * from tt order by d128 limit 2 offset 999000;  
   ii   |     d      |    d64     |    d128    |     n        
--------+------------+------------+------------+------------  
 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123  
 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123  
(2 rows)  
  
Time: 2118.647 ms

雜湊JOIN效能對比

postgres=# explain select count(*) from tt t1 join tt t2 on t1.d64 * t1.d64 + t1.d64 = t2.d64 + t2.d64 * t2.d64;  
                                    QUERY PLAN                                      
----------------------------------------------------------------------------------  
 Aggregate  (cost=6875071228.00..6875071228.01 rows=1 width=8)  
   ->  Hash Join  (cost=36707.00..5625071228.00 rows=500000000000 width=0)  
         Hash Cond: (((t1.d64 * t1.d64) + t1.d64) = (t2.d64 + (t2.d64 * t2.d64)))  
         ->  Seq Scan on tt t1  (cost=0.00..20300.00 rows=1000000 width=8)  
         ->  Hash  (cost=20300.00..20300.00 rows=1000000 width=8)  
               ->  Seq Scan on tt t2  (cost=0.00..20300.00 rows=1000000 width=8)  
(6 rows)  
  
Time: 0.508 ms  
postgres=# select count(*) from tt t1 join tt t2 on t1.d64 * t1.d64 + t1.d64 = t2.d64 + t2.d64 * t2.d64;  
  count    
---------  
 1000000  
(1 row)  
  
Time: 1681.451 ms  
postgres=# select count(*) from tt t1 join tt t2 on t1.n * t1.n + t1.n = t2.n + t2.n * t2.n;  
  count    
---------  
 1000000  
(1 row)  
  
Time: 2395.894 ms

巢狀迴圈效能對比

postgres=# explain select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d * t1.d + t1.d > t2.d + t2.d * t2.d;  
                                  QUERY PLAN                                     
-------------------------------------------------------------------------------  
 Aggregate  (cost=2699703.15..2699703.16 rows=1 width=8)  
   ->  Nested Loop  (cost=0.00..2614087.74 rows=34246165 width=0)  
         Join Filter: (((t1.d * t1.d) + t1.d) > (t2.d + (t2.d * t2.d)))  
         ->  Seq Scan on tt t1  (cost=0.00..22800.00 rows=10136 width=8)  
               Filter: (ii < 10000)  
         ->  Materialize  (cost=0.00..22850.68 rows=10136 width=8)  
               ->  Seq Scan on tt t2  (cost=0.00..22800.00 rows=10136 width=8)  
                     Filter: (ii < 10000)  
(8 rows)  
  
Time: 0.561 ms  
postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d * t1.d + t1.d > t2.d + t2.d * t2.d;  
  count     
----------  
 49985001  
(1 row)  
  
Time: 19706.890 ms  
postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.n * t1.n + t1.n > t2.n + t2.n * t2.n;  
  count     
----------  
 49985001  
(1 row)  
  
Time: 70787.289 ms  
postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d64 * t1.d64 + t1.d64 > t2.d64 + t2.d64 * t2.d64;  
  count     
----------  
 49985001  
(1 row)  
  
Time: 49861.689 ms  
postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d128 * t1.d128 + t1.d128 > t2.d128 + t2.d128 * t2.d128;  
  count     
----------  
 49985001  
(1 row)  
  
Time: 65779.153 ms