行為、審計日誌(實時索引/實時搜尋)-最佳實踐

德哥發表於2017-05-20

標籤

PostgreSQL , ES , 搜尋引擎 , 全文檢索 , 日誌分析 , 倒排索引 , 優化 , 分割槽 , 分片 , 審計日誌 , 行為日誌


背景

在很多系統中會記錄使用者的行為日誌,行為日誌包括瀏覽行為、社交行為、操作行為等。

典型的應用例如:資料庫的SQL審計、企業內部的堡壘機(行為審計)等。

行為、審計日誌的量與業務量或者操作量有關,為了滿足企業實時查詢的需求,通常需要構建搜尋引擎,比如使用ES或者使用PostgreSQL的全文檢索功能來實現。

如果使用PostgreSQL來構建,有幾個優勢,可以滿足多個需求:

1. 明細儲存的需求,除了需要建立索引的欄位,明細欄位也可以儲存在PostgreSQL中。

2. 索引的需求,即建立日誌行為欄位的全文索引。

3. 多維度索引的需求,除了日誌行為欄位的索引,還可以建立其他欄位的索引,例如時間維度,屬性維度的索引。這些索引可以組合使用,滿足多個維度的搜尋需求。

4. 不需要同步到搜尋引擎,滿足了實時搜尋的需求。

例子

測試機

磁碟,使用空間大、廉價的SATA盤,使用一塊SSD作為BCACHE寫快取。

12 * N TB          

目錄規劃,每塊盤一個目錄

/disk[1-12]          

作業系統配置

編譯PostgreSQL

wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2          
          
tar -jxvf postgresql-snapshot.tar.bz2          
cd postgresql-10beta1          
          
export USE_NAMED_POSIX_SEMAPHORES=1          
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql10_8k --with-blocksize=8 --with-wal-segsize=1024          
LIBS=-lpthread CFLAGS="-O3" make world -j 128           
LIBS=-lpthread CFLAGS="-O3" make install-world           

環境變數

vi env_pg10.sh           
          
export PS1="$USER@`/bin/hostname -s`-> "          
export PGPORT=$(($1+1920))          
export PGDATA=/disk$1/digoal/pgdata/pg_root$(($1+1920))          
export LANG=en_US.utf8          
export PGHOME=/home/digoal/pgsql10_8k          
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH          
export LD_RUN_PATH=$LD_LIBRARY_PATH          
export DATE=`date +"%Y%m%d%H%M"`          
export PATH=$PGHOME/bin:$PATH:.          
export MANPATH=$PGHOME/share/man:$MANPATH          
export PGHOST=127.0.0.1          
export PGUSER=postgres          
export PGDATABASE=postgres          
alias rm=`rm -i`          
alias ll=`ls -lh`          
unalias vi          

初始化叢集

建立12個資料庫叢集,對應到每一塊磁碟。可以充分利用磁碟的IO。

vi init.sh          
          
for ((i=1;i<13;i++))          
do          
  . ~/env_pg10.sh $i          
  initdb -D $PGDATA -E SQL_ASCII --locale=C -U postgres          
  echo "local   all             all                                     trust" > $PGDATA/pg_hba.conf          
  echo "host    all             all             127.0.0.1/32            trust" >> $PGDATA/pg_hba.conf          
  echo "host    all             all             ::1/128                 trust" >> $PGDATA/pg_hba.conf          
  echo "host    all             all             0.0.0.0/0               trust" >> $PGDATA/pg_hba.conf          
done          
. ./init.sh        

配置引數

postgresql.auto.conf          
          
# Do not edit this file manually!          
# It will be overwritten by the ALTER SYSTEM command.          
listen_addresses = `0.0.0.0`          
max_connections = 400          
unix_socket_directories = `.`          
shared_buffers = 32GB          
maintenance_work_mem = 2GB          
dynamic_shared_memory_type = posix          
vacuum_cost_delay = 0          
bgwriter_delay = 10ms          
bgwriter_lru_maxpages = 1000          
bgwriter_lru_multiplier = 10.0          
bgwriter_flush_after = 0          
max_parallel_workers_per_gather = 0          
synchronous_commit = off          
full_page_writes = off          
wal_buffers = 2047MB          
wal_writer_delay = 10ms          
wal_writer_flush_after = 0          
checkpoint_timeout = 45min          
max_wal_size = 96GB          
min_wal_size = 32GB          
checkpoint_completion_target = 0.9          
checkpoint_flush_after = 2MB          
max_wal_senders = 0          
random_page_cost = 1.3          
effective_cache_size = 32GB          
log_destination = `csvlog`          
logging_collector = on          
log_truncate_on_rotation = on          
log_checkpoints = on          
log_connections = on          
log_disconnections = on          
log_error_verbosity = verbose          
log_timezone = `PRC`          
autovacuum = on          
log_autovacuum_min_duration = 0          
autovacuum_freeze_max_age = 1500000000          
autovacuum_multixact_freeze_max_age = 1600000000          
vacuum_freeze_table_age = 1400000000          
vacuum_multixact_freeze_table_age = 1400000000          
datestyle = `iso, mdy`          
timezone = `PRC`          
lc_messages = `C`          
lc_monetary = `C`          
lc_numeric = `C`          
lc_time = `C`          
default_text_search_config = `pg_catalog.english`          
autovacuum_work_mem = 2GB          
autovacuum_max_workers = 12          
autovacuum_naptime = 30s          
autovacuum_vacuum_scale_factor = 0.2          
autovacuum_analyze_scale_factor = 0.2          
autovacuum_vacuum_cost_delay = 0                  
autovacuum_vacuum_cost_limit = 0         
zero_damaged_pages = on        
cp postgresql.auto.conf /disk1/digoal/pgdata/pg_root1921/          
...          
...          
cp postgresql.auto.conf /disk12/digoal/pgdata/pg_root1932/          

啟動叢集

將資料庫例項繫結到不同的CPU核

vi start.sh          
          
for ((i=1;i<13;i++))          
do          
  . /home/digoal/env_pg10.sh $i          
          
  if [ $i -eq 12 ]; then          
  numactl --physcpubind=56-63 pg_ctl start -o "-c port=$PGPORT"          
  else          
  numactl --physcpubind="$((($i-1)*(5)+(1)))"-"$(($i*5))" pg_ctl start -o "-c port=$PGPORT"          
  fi          
          
done          
. ./start.sh        

停止叢集

vi stop.sh           
          
for ((i=1;i<13;i++))          
do          
  . /home/digoal/env_pg10.sh $i          
          
  pg_ctl stop -m fast          
          
done          
. ./stop.sh        

建模

表結構

4個欄位,分別儲存PK(對應原始明細資料的PK),時間,使用者ID,使用者行為(tsvector欄位)。

檢索需求

檢索時可能按照時間區間,使用者ID,以及分詞條件進行檢索。

保留時長

日誌保留一段時間(例如1個月)後清除。

分割槽

每個叢集中,建立若干個分割槽表,例如本例使用了12個分割槽表。

如果條件允許,建議每個小時一個分割槽表,這樣的話可以不建時間索引,查詢時間區間的資料使用分割槽即可。

如果單個使用者的資料量很龐大,那麼建議按UID再建立雜湊或LIST分割槽,這樣的話,按照UID查詢,不需要使用索引(可以省去在UID建立索引,甚至省去儲存UID這個欄位)。

索引

行為欄位,全文索引。

使用者ID,B-TREE索引。

時間欄位,brin塊級索引。

模擬資料

時間,時序產生。

使用者ID,在一個範圍內隨機產生。

使用者行為資料,長約512字元的字串,拆分成若干個TOKEN,例如本例為40個長度不等的TOKEN。

初始化資料表

vi init.sql          
          
drop table bptest cascade;          
do language plpgsql $$          
declare          
begin          
  for i in 1..12 loop          
    execute `drop table bptest`||i||` cascade`;          
  end loop;          
end;          
$$;          
          
create table bptest(pk serial8, uid int, ts int8, content tsvector);          
create index idx_bptest_content on bptest using gin (content) with (fastupdate=on,gin_pending_list_limit=2048);          
create index idx_bptest_uid on bptest(uid);          
create index idx_bptest_ts on bptest using brin(ts);          
          
do language plpgsql $$          
declare          
begin          
  for i in 1..12 loop          
    execute `create table bptest`||i||`(like bptest including all) inherits(bptest)`;          
  end loop;          
end;          
$$;          
          
-- 產生若干字元長度的隨機字串          
CREATE OR REPLACE FUNCTION public.gen_rand_str(integer)          
 RETURNS text          
 LANGUAGE sql          
 STRICT          
AS $function$          
  select string_agg(a[random()*6+1],``) from generate_series(1,$1), (select array[`a`,`b`,`c`,`d`,`e`,`f`,` `]) t(a);          
$function$;          
psql -f ./init.sql -p 1921          
...          
...          
psql -f ./init.sql -p 1932          

測試

灌入測試資料,例如每張表插入2億,一個資料庫插入24億(約6TB),總共插入288億(約72TB)。

每10條一批灌入。

vi test.sh          
          
for ((i=1;i<13;i++))          
do          
  echo "insert into bptest${i} (uid,ts,content) select random()*100000 , extract(epoch from now()), to_tsvector(`english`,gen_rand_str(512)) from generate_series(1,10);" > ~/test${i}.sql          
done          
          
for ((i=1;i<13;i++))          
do          
  . /home/digoal/env_pg10.sh ${i}          
          
  for ((x=1;x<13;x++))          
  do          
    nohup pgbench -M prepared -n -r -P 3 -f ./test${x}.sql -c 1 -j 1 -t 200000000 >/tmp/bptest_${i}_${x}.log 2>&1 &          
  done          
done          
          
          
chmod 500 test.sh          

查詢測試資料如下,資料非常隨機,每條記錄的content約40個元素,長度限定在512字元。

select * from bptest1 limit 1;          
          
pk      | 1        
uid     | 849185          
ts      | 1494928859          
content | `aaaefba`:14 `acddcfd`:39 `acdeeaadbffdbbecceb`:50 `aceeedfbaefbdfcbd`:59 `adbbeddbecfdcffaeedcedaeeddaeaaeebfbdcdcecfbbebfcebabceffbfdbfbfa`:60           
`adcdf`:61 `aead`:47 `afddf`:70 `ba`:8 `bae`:37 `bbaacffbcafeffafefdf`:38 `bbe`:55 `bbecfdf`:32 `bcbfd`:27 `bdce`:45 `bdeccbcdeaabefbeeebcdbfddd`:19           
`bed`:17 `beedeadccbbbecbfcbf`:44 `bfccaeddaddbc`:2 `cafdfcf`:5 `cbcacefaff`:3 `cbcfc`:52 `cbfef`:63 `ccdcbedb`:33 `ccdcd`:20 `cd`:6 `cfecfeeccabf`:42           
`cffb`:15 `dabdfddeeabfdcefb`:16 `dacdeecfbcefebfabeedfabbaccec`:57 `daee`:1 `daffcdffadddbaffd`:68 `dbcddacefcd`:9           
`dbdbcbfadfffbdddaaabdcbcecdbecbbdecffbfcfecbbfebfebcadefecfceadaeffd`:11 `dcdf`:23 `dd`:53 `ddec`:31 `debdcdebfffebdbfdeefffbcfbccbececdbeaffffedfbefdcccbbccadedecfbeccccbbb`:48           
`deefaeeaabdbbdafcfcbeecc`:71 `df`:26 `dfcbbcd`:46 `e`:7,51,56 `eafddcaac`:43 `ecbaffa`:21 `ecdeeceddbdcbfcabdc`:10 `ecedcec`:41 `ed`:66           
`edcbaecfcdfbcbcdedeebdbfceeeececfac`:35 `eeca`:25 `eeebafeacfebfdbdbddaacabebabbfbfdefeddefccfbeaefdbf`:29 `eefdbfcadebcbbfffaefcaecafbddbdbfcf`:13           
`ef`:58 `efbdc`:67 `efccdddaebfbdaffcdfcbfdcbdeb`:54 `efccebdddededdeda`:64 `effcbfdfdeebfbbcfaabfd`:12 `f`:24,28,65 `fbbfccfcbcba`:30 `fc`:4           
`fcbbdbbaefcefefdf`:34 `fd`:18 `fdffcbe`:69 `fea`:62 `feeabdcd`:36 `feeadcedecedebaedccffbfddadcfececbefddcbeaedbebfadefedcbd`:22 `feffceceaeec`:49 `ffaffde`:40          

使用者全文檢索請求,輸入4個查詢條件,流式返回PK。

建議使用流式返回介面,因為結果集可能非常大。

select pk from bptest1 where uid=$1 and ts between $2 and $3 and content @@ to_tsquery(`english`, $4);        

壓測

./test.sh          

資源使用

dstat

CPU基本耗盡,磁碟的寫入也非常的充分

cpu大部分為user的開銷,後面使用perf看一下

dstat          
          
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--          
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw           
 92   7   1   0   0   1|1075M 2435M|4048B 2297B|   0     0 | 142k  167k          
 92   7   0   0   0   1|1137M 2075M|2391B 1945B|   0     0 | 135k  161k          
 91   8   0   0   0   0|1182M 2125M|3483B 2845B|   0     0 | 140k  166k          
 91   8   0   0   0   1|1193M 1971M|3788B 1633B|   0     0 | 135k  159k          
 91   8   0   0   0   0|1089M 2305M|2232B 1478B|   0     0 | 139k  159k          
 92   7   0   0   0   1| 986M 2795M|2176B 1568B|   0     0 | 127k  142k          
 92   7   1   0   0   0| 760M 2864M|6028B 1408B|   0     0 | 116k  118k          
 90   8   0   0   0   0|1029M 3057M|1565B 2116B|   0     0 | 132k  150k          
 90   9   1   0   0   1|1000M 3237M|2336B 4850B|   0     0 | 133k  154k          
 90   8   1   1   0   1| 659M 4399M|2872B 7992B|   0     0 | 115k  119k          
 91   7   0   0   0   1| 925M 2996M|1293B 1059B|   0     0 | 122k  127k          
 90   8   1   1   0   1| 996M 3350M| 664B  574B|   0     0 | 133k  148k          
 91   7   1   0   0   1| 948M 2927M|3525B 2500B|   0     0 | 132k  146k          
 90   8   0   1   0   0|1114M 2869M|1751B 2645B|   0     0 | 132k  150k          
 90   8   0   1   0   1|1267M 2408M|3003B 2244B|   0     0 | 137k  167k          
 91   8   0   1   0   1|1086M 2539M| 900B  347B|   0     0 | 133k  154k          
 91   8   0   0   0   1| 998M 2614M|1975B 1757B|   0     0 | 130k  151k          
 91   8   0   0   0   0|1120M 2150M|1466B 4911B|   0     0 | 130k  154k          
 92   7   0   0   0   0|1163M 2387M|1356B  498B|   0     0 | 136k  163k          
 90   8   1   1   0   1| 864M 2656M|2601B 3373B|   0     0 | 130k  143k          
 91   8   0   0   0   1| 987M 2651M|2052B  898B|   0     0 | 135k  154k          
 91   8   0   0   0   0|1073M 2205M|2479B 2319B|   0     0 | 130k  144k          
 90   8   1   1   0   1| 951M 2941M|1390B 1001B|   0     0 | 130k  148k          

磁碟使用率

iostat -x           
          
avg-cpu:  %user   %nice %system %iowait  %steal   %idle          
          85.29    0.49    9.89    1.90    0.00    2.43          
          
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util          
sdb               0.00    92.40   99.40  389.00  2822.40 136435.20   285.13     2.25    4.60   0.65  31.64          
sdc               0.00    33.60  154.20  211.60  4838.40 85700.80   247.51     1.14    3.10   0.63  22.96          
sdd               0.00    63.00  232.40  238.40  7316.80 109648.00   248.44     2.17    4.61   0.99  46.54          
sde               0.00    78.80  269.60  340.80  7980.80 102419.20   180.87     2.53    4.14   0.94  57.62          
sdf               0.00    58.40  283.00  234.20  8204.80 99129.60   207.53     2.30    4.45   0.93  48.20          
sdg               0.00    50.80  207.60  236.60  6652.80 94337.60   227.35     1.42    3.19   0.68  30.34          
sdh               0.00   102.20  109.40  475.20  3489.60 131211.20   230.42     2.60    4.45   0.52  30.40          
sdi               0.00    70.20  107.00  337.00  3228.80 79603.20   186.56     1.35    3.04   0.53  23.38          
sdj               0.00    31.00   70.60  158.80  2534.40 85124.80   382.12     0.82    3.59   0.86  19.72          
sdk               0.20    58.40  190.60  295.80  5587.20 123539.20   265.47     1.74    3.57   0.68  33.28          
sdl               0.00    91.00  162.80  396.40  4441.60 119507.20   221.65     1.98    3.54   0.59  33.26          
sdm               0.00   274.80  103.20  359.20  2296.00 158908.80   348.63     3.81    8.23   1.08  50.06          

perf

大部分的開銷是postgres程式消耗的,建議使用以下開關重新編譯一下.

《PostgreSQL 原始碼效能診斷(perf profiling)指南》

top -ag          
          
   PerfTop:    9171 irqs/sec  kernel:63.7%  exact:  0.0% [1000Hz cycles],  (all, 64 CPUs)        
----------------------------------------------------------------------------------------------                           
        
 samples  pcnt function                    DSO        
 _______ _____ ___________________________ _______________________________________        
        
23044.00  4.5% tsCompareString             /home/digoal/pgsql10_8k/bin/postgres         
19821.00  3.9% ExecInterpExpr              /home/digoal/pgsql10_8k/bin/postgres         
12258.00  2.4% gintuple_get_key            /home/digoal/pgsql10_8k/bin/postgres         
12208.00  2.4% pg_detoast_datum_packed     /home/digoal/pgsql10_8k/bin/postgres         
11111.00  2.2% hash_search_with_hash_value /home/digoal/pgsql10_8k/bin/postgres         
10318.00  2.0% memcpy                      /lib64/libc-2.12.so                            
 9078.00  1.8% AllocSetAlloc               /home/digoal/pgsql10_8k/bin/postgres         
 8944.00  1.7% advance_aggregates          /home/digoal/pgsql10_8k/bin/postgres         
 8547.00  1.7% cmpEntryAccumulator         /home/digoal/pgsql10_8k/bin/postgres         
 7311.00  1.4% array_seek                  /home/digoal/pgsql10_8k/bin/postgres         
 6744.00  1.3% gin_cmp_tslexeme            /home/digoal/pgsql10_8k/bin/postgres         
 6650.00  1.3% __closure_wake_up           [bcache]                                     
 6550.00  1.3% appendBinaryStringInfo      /home/digoal/pgsql10_8k/bin/postgres         
 6475.00  1.3% TParserGet                  /home/digoal/pgsql10_8k/bin/postgres         
 5578.00  1.1% ginFindLeafPage             /home/digoal/pgsql10_8k/bin/postgres         
 5543.00  1.1% PyParser_AddToken           /lib64/libpython2.7.so.1.0                  
 5412.00  1.1% array_get_element           /home/digoal/pgsql10_8k/bin/postgres         
 5355.00  1.0% heap_fill_tuple             /home/digoal/pgsql10_8k/bin/postgres         
 4936.00  1.0% entryLocateLeafEntry        /home/digoal/pgsql10_8k/bin/postgres         
 4732.00  0.9% heap_form_minimal_tuple     /home/digoal/pgsql10_8k/bin/postgres         
 4512.00  0.9% rb_insert                   /home/digoal/pgsql10_8k/bin/postgres        

top

top -c -u digoal          
          
top - 19:20:47 up 179 days,  5:38,  8 users,  load average: 183.79, 189.01, 166.41          
Tasks: 2939 total, 159 running, 2780 sleeping,   0 stopped,   0 zombie          
Cpu(s): 87.7%us,  8.9%sy,  0.9%ni,  0.9%id,  1.2%wa,  0.0%hi,  0.5%si,  0.0%st          
Mem:  529321828k total, 512395020k used, 16926808k free,   299780k buffers          
Swap:        0k total,        0k used,        0k free, 482162560k cached          
          
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                   
11258 digoal  20   0 20.7g 2.9g 2.9g R 78.7  0.6   5:11.98 postgres: postgres postgres 127.0.0.1(52848) INSERT                                                            
11253 digoal  20   0 20.7g 3.0g 2.9g R 70.7  0.6   5:14.48 postgres: postgres postgres 127.0.0.1(52843) INSERT                                                            
11264 digoal  20   0 20.7g 3.0g 2.9g R 68.1  0.6   5:14.63 postgres: postgres postgres 127.0.0.1(52854) INSERT                                                            
11263 digoal  20   0 20.7g 3.0g 2.9g R 65.2  0.6   5:14.89 postgres: postgres postgres 127.0.0.1(52853) INSERT                                                            
11250 digoal  20   0 20.7g 3.0g 3.0g R 59.4  0.6   5:16.66 postgres: postgres postgres 127.0.0.1(52840) idle                                                              
11268 digoal  20   0 20.7g 2.9g 2.9g R 53.3  0.6   5:11.36 postgres: postgres postgres 127.0.0.1(52858) INSERT                                                            
11266 digoal  20   0 20.7g 3.0g 3.0g R 52.3  0.6   5:09.00 postgres: postgres postgres 127.0.0.1(52856) INSERT                                                            
11068 digoal  20   0 20.7g 2.6g 2.6g R 51.4  0.5   4:44.47 postgres: postgres postgres 127.0.0.1(45823) INSERT                                                            
11097 digoal  20   0 20.7g 2.6g 2.5g R 49.5  0.5   4:47.85 postgres: postgres postgres 127.0.0.1(45839) INSERT                                                            
11161 digoal  20   0 20.7g 2.6g 2.6g R 49.5  0.5   4:47.87 postgres: postgres postgres 127.0.0.1(44881) INSERT                                                            
11256 digoal  20   0 20.7g 3.0g 3.0g R 49.5  0.6   5:14.69 postgres: postgres postgres 127.0.0.1(52846) INSERT                                                            
10819 digoal  20   0 20.7g 2.7g 2.7g R 48.8  0.5   4:58.17 postgres: postgres postgres 127.0.0.1(47342) INSERT                                                            
11107 digoal  20   0 20.7g 2.7g 2.7g R 48.8  0.5   5:02.00 postgres: postgres postgres 127.0.0.1(59612) INSERT                                                            
11255 digoal  20   0 20.7g 3.0g 3.0g R 48.2  0.6   5:15.68 postgres: postgres postgres 127.0.0.1(52845) INSERT                                                            
11267 digoal  20   0 20.7g 3.0g 3.0g R 47.9  0.6   5:18.82 postgres: postgres postgres 127.0.0.1(52857) INSERT                                                            
11066 digoal  20   0 20.7g 2.6g 2.6g R 46.9  0.5   4:44.97 postgres: postgres postgres 127.0.0.1(45821) INSERT                                                            
11222 digoal  20   0 20.7g 2.6g 2.6g R 45.9  0.5   5:00.43 postgres: postgres postgres 127.0.0.1(40379) idle                                                              
11207 digoal  20   0 20.7g 2.6g 2.6g R 45.6  0.5   5:04.59 postgres: postgres postgres 127.0.0.1(46160) INSERT                                                            
11224 digoal  20   0 20.7g 2.6g 2.6g R 45.3  0.5   5:02.60 postgres: postgres postgres 127.0.0.1(40381) INSERT                                                            
11249 digoal  20   0 20.7g 2.6g 2.6g R 45.3  0.5   4:59.58 postgres: postgres postgres 127.0.0.1(46187) INSERT                                                            
11045 digoal  20   0 20.7g 2.6g 2.6g R 44.6  0.5   4:39.75 postgres: postgres postgres 127.0.0.1(64424) idle                                                              
11064 digoal  20   0 20.7g 2.6g 2.6g R 44.6  0.5   4:44.69 postgres: postgres postgres 127.0.0.1(45819) INSERT                                                            
11145 digoal  20   0 20.7g 2.6g 2.6g S 44.3  0.5   4:46.18 postgres: postgres postgres 127.0.0.1(44876) INSERT                                                            
10865 digoal  20   0 20.7g 2.6g 2.6g R 44.0  0.5   4:59.89 postgres: postgres postgres 127.0.0.1(49769) INSERT                                                            
11080 digoal  20   0 20.7g 2.6g 2.5g R 44.0  0.5   4:43.70 postgres: postgres postgres 127.0.0.1(45825) INSERT                                                            
11247 digoal  20   0 20.7g 2.6g 2.6g R 43.4  0.5   5:01.91 postgres: postgres postgres 127.0.0.1(40391) idle                                                              
11163 digoal  20   0 20.7g 2.6g 2.6g R 42.7  0.5   4:48.34 postgres: postgres postgres 127.0.0.1(44882) idle                                                              
11164 digoal  20   0 20.7g 2.6g 2.6g R 42.4  0.5   4:53.21 postgres: postgres postgres 127.0.0.1(44883) INSERT                                                            
10882 digoal  20   0 20.7g 2.6g 2.6g R 41.8  0.5   5:04.78 postgres: postgres postgres 127.0.0.1(49772) INSERT                                                            
10868 digoal  20   0 20.7g 2.6g 2.6g R 41.4  0.5   5:00.30 postgres: postgres postgres 127.0.0.1(49770) INSERT          

寫入tps

換算成單機的寫入,約6.5萬行/s。

progress: 729.0 s, 55.0 tps, lat 11.610 ms stddev 4.836          
progress: 732.0 s, 59.7 tps, lat 20.071 ms stddev 107.984          
progress: 735.0 s, 57.0 tps, lat 20.492 ms stddev 125.445          
progress: 738.0 s, 38.7 tps, lat 25.891 ms stddev 154.607          
progress: 741.0 s, 41.0 tps, lat 24.405 ms stddev 140.247          
progress: 744.0 s, 43.0 tps, lat 13.550 ms stddev 10.448          
progress: 747.0 s, 60.0 tps, lat 20.691 ms stddev 131.640          
progress: 750.0 s, 60.0 tps, lat 17.394 ms stddev 83.385          
progress: 753.0 s, 44.3 tps, lat 25.510 ms stddev 146.719          
progress: 756.0 s, 25.0 tps, lat 39.819 ms stddev 213.642          
progress: 759.0 s, 50.0 tps, lat 11.439 ms stddev 5.319          
progress: 762.0 s, 60.0 tps, lat 20.979 ms stddev 106.782          
progress: 765.0 s, 60.0 tps, lat 18.778 ms stddev 167.714          
progress: 768.0 s, 58.0 tps, lat 18.017 ms stddev 99.949          
progress: 771.0 s, 51.0 tps, lat 19.636 ms stddev 124.429          

寫入效能基本上取決於tsvector欄位的元素個數,雜湊程度,本例每條記錄約40個元素。如果元素個數下降一半,效能將提升一倍左右。

postgres=# select array_length(tsvector_to_array(content),1) from bptest1 limit 10;        
 array_length         
--------------        
           40        
           37        
           40        
           45        
           35        
           42        
           38        
           46        
           30        
           40        
(10 rows)        

評估每秒構建了多少個索引條目

1. 全文檢索索引條目

每條記錄約40個元素,當插入的tps=6.5萬時,構建的全文檢索條目數約 260萬/s。

65000*40 = 2600000        

2. uid索引條目,較小,忽略不計。

3. ts索引條目,使用BRIN塊級索引,忽略不計。

效能影響最大,資源消耗最多的就是全文檢索索引條目的構建。

查詢效能

舉例

postgres=# dt+ bptest1      
                      List of relations      
 Schema |  Name   | Type  |  Owner   |  Size   | Description       
--------+---------+-------+----------+---------+-------------      
 public | bptest1 | table | postgres | 1689 MB |       
(1 row)      
      
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from bptest1 where ts between 1494999617 and 1495999617 and content @@ to_tsquery (`english`,`abc`);      
                                                              QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------      
 Bitmap Heap Scan on public.bptest1  (cost=175.95..23691.41 rows=20015 width=811) (actual time=7.017..23.376 rows=19755 loops=1)      
   Output: uid, ts, content      
   Recheck Cond: (bptest1.content @@ ```abc```::tsquery)      
   Filter: ((bptest1.ts >= 1494999617) AND (bptest1.ts <= 1495999617))      
   Heap Blocks: exact=18933      
   Buffers: shared hit=18948      
   ->  Bitmap Index Scan on bptest1_content_idx  (cost=0.00..170.94 rows=20019 width=0) (actual time=3.811..3.811 rows=19755 loops=1)      
         Index Cond: (bptest1.content @@ ```abc```::tsquery)      
         Buffers: shared hit=15      
 Planning time: 0.097 ms      
 Execution time: 24.517 ms      
(11 rows)       
      
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from bptest1 where ts between 1494999617 and 1495999617 and content @@ to_tsquery (`english`,`abc & bc`);      
                                                            QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------      
 Bitmap Heap Scan on public.bptest1  (cost=36.27..2598.42 rows=1996 width=811) (actual time=4.577..6.711 rows=2125 loops=1)      
   Output: uid, ts, content      
   Recheck Cond: (bptest1.content @@ ```abc`` & ``bc```::tsquery)      
   Filter: ((bptest1.ts >= 1494999617) AND (bptest1.ts <= 1495999617))      
   Heap Blocks: exact=2116      
   Buffers: shared hit=2239      
   ->  Bitmap Index Scan on bptest1_content_idx  (cost=0.00..35.77 rows=1997 width=0) (actual time=4.291..4.291 rows=2125 loops=1)      
         Index Cond: (bptest1.content @@ ```abc`` & ``bc```::tsquery)      
         Buffers: shared hit=123      
 Planning time: 0.125 ms      
 Execution time: 6.849 ms      
(11 rows)      

純SSD fsync=on 寫入效能

1. 寫入TPS

7萬/s ,構建的全文檢索條目數約 280萬/s。

效能比較平穩。

純SATA+SSD bcache fsync=off 寫入效能

1. 寫入TPS

7.5萬/s ,構建的全文檢索條目數約 300萬/s。

效能比較平穩。

小結

1. 查詢聚合

由於日誌資料打散分佈在多個叢集,多個表內,建議使用plproxy進行查詢的聚合。

參考

《A Smart PostgreSQL extension plproxy 2.2 practices》

《阿里雲ApsaraDB RDS for PostgreSQL 最佳實踐 – 4 水平分庫 之 節點擴充套件》

《阿里雲ApsaraDB RDS for PostgreSQL 最佳實踐 – 3 水平分庫 vs 單機 效能》

《阿里雲ApsaraDB RDS for PostgreSQL 最佳實踐 – 2 教你RDS PG的水平分庫》

2. 寫入分片

寫入分片,可以在業務層完成,隨機打散寫入。

實際應用時,可以根據需要,切分成更多的分割槽。

3. 主要的開銷是postgres的開銷,如果需要詳細的分析,建議重新編譯postgres

4. gin索引的優化

https://www.postgresql.org/docs/9.6/static/sql-createindex.html

GIN indexes accept different parameters:          
          
1. fastupdate          
          
This setting controls usage of the fast update technique described in Section 63.4.1.           
It is a Boolean parameter: ON enables fast update, OFF disables it.           
(Alternative spellings of ON and OFF are allowed as described in Section 19.1.) The default is ON.          
          
Note: Turning fastupdate off via ALTER INDEX prevents future insertions from going into the list of pending index entries,           
but does not in itself flush previous entries.           
          
You might want to VACUUM the table or call gin_clean_pending_list function afterward to ensure the pending list is emptied.          
          
2. gin_pending_list_limit          
          
Custom gin_pending_list_limit parameter.           
This value is specified in kilobytes.          

gin_pending_list_limit的目的是延遲合併,因為一條記錄中可能涉及較多的GIN KEY,如果實時更新,GIN索引的寫入量會非常大,效能受到影響。

本例gin_pending_list_limit設定為2MB,tps比較平緩,如果設定過大,當CPU資源不足時,抖動會比較嚴重。

使用者可以根據實際測試,設定合理的gin_pending_list_limit值。

5. 如果把PostgreSQL完全當成索引庫使用,並且允許資料丟失,那麼可以使用fsync=off的開關,(檢查點fsync對IO的影響比較大,本例使用的是SATA盤,將會導致較大的效能抖動)。

postgresql.auto.conf      
      
fsync = off      
zero_damaged_pages = on        

如果有ha的話,丟失的風險又會更小。(但是伺服器CRASH後,需要重建備庫,這麼大的量,還是挺恐怖的。)

建議用更多的資料庫例項,每個例項的大小可控(例如 < 2TB),重建的時間也相對可控。

6. 為了達到更好的響應速度(RT),建議明細和索引分開存放,明細要求寫入RT低,索引可以存在一定的延遲。 並且索引與明細資料的可靠性要求也不一樣。


相關文章