PostgreSQL業務資料質量實時監控實踐
標籤
PostgreSQL , pg_stat , 實時質量監控
背景
當業務系統越來越龐大後,各個業務線的資料對接會越來越頻繁,但是也會引入一個問題。
資料質量。
例如上游是否去掉了一些欄位,或者上游資料是否及時觸達,又或者上游資料本身是否出現了問題。
通過業務資料質量監控,可以發現這些問題。
而PostgreSQL內建的統計資訊能力,已經滿足了大部分業務資料質量實時監控場景的需求。
如果需要更加業務話、定製的資料質量監控。PostgreSQL還能支援閱後即焚,流式計算、非同步訊息等特性,支援實時的資料質量監控。
內建功能,業務資料質量實時監控
PostgreSQL內建統計資訊如下:
1、準實時記錄數
postgres=# d pg_class
Table "pg_catalog.pg_class"
Column | Type | Collation | Nullable | Default
---------------------+--------------+-----------+----------+---------
relname | name | | not null | -- 物件名
relnamespace | oid | | not null | -- 物件所屬的schema, 對應pg_namespace.oid
relpages | integer | | not null | -- 評估的頁數(單位為block_size)
reltuples | real | | not null | -- 評估的記錄數
2、準實時的每列的統計資訊(空值佔比、平均長度、有多少唯一值、高頻詞、高頻詞的佔比、均勻分佈柱狀圖、線性相關性、高頻元素、高頻元素佔比、高頻元素柱狀圖)
詳細的解釋如下:
postgres=# d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Default
------------------------+----------+---------
schemaname | name | -- 物件所屬的schema
tablename | name | -- 物件名
attname | name | -- 列名
inherited | boolean | -- 是否為繼承表的統計資訊(false時表示當前表的統計資訊,true時表示包含所有繼承表的統計資訊)
null_frac | real | -- 該列空值比例
avg_width | integer | -- 該列平均長度
n_distinct | real | -- 該列唯一值個數(-1表示唯一,小於1表示佔比,大於等於1表示實際的唯一值個數)
most_common_vals | anyarray | -- 該列高頻詞
most_common_freqs | real[] | -- 該列高頻詞對應的出現頻率
histogram_bounds | anyarray | -- 該列柱狀圖(表示隔出的每個BUCKET的記錄數均等)
correlation | real | -- 該列儲存相關性(-1到1的區間),絕對值越小,儲存越離散。小於0表示反向相關,大於0表示正向相關
most_common_elems | anyarray | -- 該列為多值型別(陣列)時,多值元素的高頻詞
most_common_elem_freqs | real[] | -- 多值元素高頻詞的出現頻率
elem_count_histogram | real[] | -- 多值元素的柱狀圖中,每個區間的非空唯一元素個數
3、準實時的每個表的統計資訊,(被全表掃多少次,使用全表掃的方法掃了多少條記錄,被索引掃多少次,使用索引掃掃了多少條記錄,寫入多少條記錄,更新多少條記錄,有多少DEAD TUPLE等)。
postgres=# d pg_stat_all_tables
View "pg_catalog.pg_stat_all_tables"
Column | Type | Default
---------------------+--------------------------+---------
relid | oid |
schemaname | name |
relname | name |
seq_scan | bigint | -- 被全表掃多少次
seq_tup_read | bigint | -- 使用全表掃的方法掃了多少條記錄
idx_scan | bigint | -- 被索引掃多少次
idx_tup_fetch | bigint | -- 使用索引掃的方法掃了多少條記錄
n_tup_ins | bigint | -- 插入了多少記錄
n_tup_upd | bigint | -- 更新了多少記錄
n_tup_del | bigint | -- 刪除了多少記錄
n_tup_hot_upd | bigint | -- HOT更新了多少記錄
n_live_tup | bigint | -- 多少可見記錄
n_dead_tup | bigint | -- 多少垃圾記錄
n_mod_since_analyze | bigint |
last_vacuum | timestamp with time zone |
last_autovacuum | timestamp with time zone |
last_analyze | timestamp with time zone |
last_autoanalyze | timestamp with time zone |
vacuum_count | bigint |
autovacuum_count | bigint |
analyze_count | bigint |
autoanalyze_count | bigint |
4、統計資訊分析排程策略
PostgreSQL會根據表記錄的變化,自動收集統計資訊。排程的引數控制如下:
#track_counts = on
#autovacuum = on # Enable autovacuum subprocess? `on`
autovacuum_naptime = 15s # time between autovacuum runs
#autovacuum_analyze_threshold = 50 # min number of row updates before
# analyze
預設變更 0.1% 後就會自動收集統計資訊。
#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
通過內建的統計資訊能得到這些資訊:
1、準實時記錄數
2、每列(空值佔比、平均長度、有多少唯一值、高頻詞、高頻詞的佔比、均勻分佈柱狀圖、線性相關性、高頻元素、高頻元素佔比、高頻元素柱狀圖)
業務資料質量可以根據以上反饋,實時被發現。
例子
1、建立測試表
create table test(id int primary key, c1 int, c2 int, info text, crt_time timestamp);
create index idx_test_1 on test (crt_time);
2、建立壓測指令碼
vi test.sql
set id random(1,10000000)
insert into test values (:id, random()*100, random()*10000, random()::text, now()) on conflict (id) do update set crt_time=now();
3、壓測
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 1200
4、建立清除資料排程,保持30秒的資料。
delete from test where ctid = any (array(
select ctid from test where crt_time < now()-interval `30 second`
));
0.1秒排程一次
psql
delete from test where ctid = any (array(
select ctid from test where crt_time < now()-interval `30 second`
));
watch 0.1
日誌如下
DELETE 18470
Fri 08 Dec 2017 04:31:54 PM CST (every 0.1s)
DELETE 19572
Fri 08 Dec 2017 04:31:55 PM CST (every 0.1s)
DELETE 20159
Fri 08 Dec 2017 04:31:55 PM CST (every 0.1s)
DELETE 20143
Fri 08 Dec 2017 04:31:55 PM CST (every 0.1s)
DELETE 21401
Fri 08 Dec 2017 04:31:55 PM CST (every 0.1s)
DELETE 21956
Fri 08 Dec 2017 04:31:56 PM CST (every 0.1s)
DELETE 19978
Fri 08 Dec 2017 04:31:56 PM CST (every 0.1s)
DELETE 21916
5、實時監測統計資訊
每列統計資訊
postgres=# select attname,null_frac,avg_width,n_distinct,most_common_vals,most_common_freqs,histogram_bounds,correlation from pg_stats where tablename=`test`;
attname | id
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds | {25,99836,193910,289331,387900,492669,593584,695430,795413,890787,1001849,1100457,1203161,1301537,1400265,1497824,1595610,1702278,1809415,1912946,2006274,2108505,2213771,2314440,2409333,2513067,2616217,2709052,2813209,2916342,3016292,3110554,3210817,3305896,3406145,3512379,3616638,3705990,3804538,3902207,4007939,4119100,4214497,4314986,4405492,4513675,4613327,4704905,4806556,4914360,5020248,5105998,5194904,5292779,5394640,5497986,5600441,5705246,5806209,5905498,6006522,6115688,6212831,6308451,6408320,6516028,6622895,6720613,6817877,6921460,7021999,7118151,7220074,7315355,7413563,7499978,7603076,7695692,7805120,7906168,8000492,8099783,8200918,8292854,8389462,8491879,8589691,8696502,8798076,8892978,8992364,9089390,9192142,9294759,9399562,9497099,9601571,9696437,9800758,9905327,9999758}
correlation | -0.00220302
.....
attname | c2
null_frac | 0
avg_width | 4
n_distinct | 9989
most_common_vals | {3056,6203,1352,1649,1777,3805,7029,420,430,705,1015,1143,2810,3036,3075,3431,3792,4459,4812,5013,5662,5725,5766,6445,6882,7034,7064,7185,7189,7347,8266,8686,8897,9042,9149,9326,9392,9648,9652,9802,63,164,235,453,595,626,672,813,847,1626,1636,1663,1749,1858,2026,2057,2080,2106,2283,2521,2596,2666,2797,2969,3131,3144,3416,3500,3870,3903,3956,3959,4252,4265,4505,4532,4912,5048,5363,5451,5644,5714,5734,5739,5928,5940,5987,6261,6352,6498,6646,6708,6886,6914,7144,7397,7589,7610,7640,7687}
most_common_freqs | {0.000366667,0.000366667,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667}
histogram_bounds | {0,103,201,301,399,495,604,697,802,904,1009,1121,1224,1320,1419,1514,1623,1724,1820,1930,2045,2147,2240,2335,2433,2532,2638,2738,2846,2942,3038,3143,3246,3342,3443,3547,3644,3744,3852,3966,4064,4162,4262,4354,4460,4562,4655,4755,4851,4948,5046,5143,5237,5340,5428,5532,5625,5730,5830,5932,6048,6144,6248,6349,6456,6562,6657,6768,6859,6964,7060,7161,7264,7357,7454,7547,7638,7749,7852,7956,8046,8138,8240,8337,8445,8539,8626,8728,8825,8924,9016,9116,9214,9311,9420,9512,9603,9709,9811,9911,10000}
correlation | -0.00246515
...
attname | crt_time
null_frac | 0
avg_width | 8
n_distinct | -0.931747
most_common_vals | {"2017-12-08 16:32:53.836223","2017-12-08 16:33:02.700473","2017-12-08 16:33:03.226319","2017-12-08 16:33:03.613826","2017-12-08 16:33:08.171908","2017-12-08 16:33:14.727654","2017-12-08 16:33:20.857187","2017-12-08 16:33:22.519299","2017-12-08 16:33:23.388035","2017-12-08 16:33:23.519205"}
most_common_freqs | {6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05}
histogram_bounds | {"2017-12-08 16:32:50.397367","2017-12-08 16:32:50.987576","2017-12-08 16:32:51.628523","2017-12-08 16:32:52.117421","2017-12-08 16:32:52.610271","2017-12-08 16:32:53.152021","2017-12-08 16:32:53.712685","2017-12-08 16:32:54.3036","2017-12-08 16:32:54.735576","2017-12-08 16:32:55.269238","2017-12-08 16:32:55.691081","2017-12-08 16:32:56.066085","2017-12-08 16:32:56.541396","2017-12-08 16:32:56.865717","2017-12-08 16:32:57.350169","2017-12-08 16:32:57.698694","2017-12-08 16:32:58.062828","2017-12-08 16:32:58.464265","2017-12-08 16:32:58.92354","2017-12-08 16:32:59.27284","2017-12-08 16:32:59.667347","2017-12-08 16:32:59.984229","2017-12-08 16:33:00.310772","2017-12-08 16:33:00.644104","2017-12-08 16:33:00.976184","2017-12-08 16:33:01.366153","2017-12-08 16:33:01.691384","2017-12-08 16:33:02.021643","2017-12-08 16:33:02.382856","2017-12-08 16:33:02.729636","2017-12-08 16:33:03.035666","2017-12-08 16:33:03.508461","2017-12-08 16:33:03.829351","2017-12-08 16:33:04.151727","2017-12-08 16:33:04.4596","2017-12-08 16:33:04.76933","2017-12-08 16:33:05.125295","2017-12-08 16:33:05.537555","2017-12-08 16:33:05.83828","2017-12-08 16:33:06.15387","2017-12-08 16:33:06.545922","2017-12-08 16:33:06.843679","2017-12-08 16:33:07.111281","2017-12-08 16:33:07.414602","2017-12-08 16:33:07.707961","2017-12-08 16:33:08.119891","2017-12-08 16:33:08.388883","2017-12-08 16:33:08.674867","2017-12-08 16:33:08.979336","2017-12-08 16:33:09.339377","2017-12-08 16:33:09.647791","2017-12-08 16:33:09.94157","2017-12-08 16:33:10.232294","2017-12-08 16:33:10.652072","2017-12-08 16:33:10.921087","2017-12-08 16:33:11.17986","2017-12-08 16:33:11.477399","2017-12-08 16:33:11.776529","2017-12-08 16:33:12.110676","2017-12-08 16:33:12.382742","2017-12-08 16:33:12.70362","2017-12-08 16:33:13.020485","2017-12-08 16:33:13.477398","2017-12-08 16:33:13.788134","2017-12-08 16:33:14.072125","2017-12-08 16:33:14.346058","2017-12-08 16:33:14.625692","2017-12-08 16:33:14.889661","2017-12-08 16:33:15.139977","2017-12-08 16:33:15.390732","2017-12-08 16:33:15.697878","2017-12-08 16:33:16.127449","2017-12-08 16:33:16.438117","2017-12-08 16:33:16.725608","2017-12-08 16:33:17.01954","2017-12-08 16:33:17.344609","2017-12-08 16:33:17.602447","2017-12-08 16:33:17.919983","2017-12-08 16:33:18.201386","2017-12-08 16:33:18.444387","2017-12-08 16:33:18.714402","2017-12-08 16:33:19.099394","2017-12-08 16:33:19.402888","2017-12-08 16:33:19.673556","2017-12-08 16:33:19.991907","2017-12-08 16:33:20.23329","2017-12-08 16:33:20.517752","2017-12-08 16:33:20.783084","2017-12-08 16:33:21.032402","2017-12-08 16:33:21.304109","2017-12-08 16:33:21.725122","2017-12-08 16:33:21.998994","2017-12-08 16:33:22.232959","2017-12-08 16:33:22.462384","2017-12-08 16:33:22.729792","2017-12-08 16:33:23.001244","2017-12-08 16:33:23.251215","2017-12-08 16:33:23.534155","2017-12-08 16:33:23.772144","2017-12-08 16:33:24.076088","2017-12-08 16:33:24.471151"}
correlation | 0.760231
記錄數
postgres=# select reltuples from pg_class where relname=`test`;
-[ RECORD 1 ]----------
reltuples | 3.74614e+06
DML活躍度統計資訊
postgres=# select * from pg_stat_all_tables where relname =`test`;
-[ RECORD 1 ]-------+------------------------------
relid | 591006
schemaname | public
relname | test
seq_scan | 2
seq_tup_read | 0
idx_scan | 28300980
idx_tup_fetch | 24713736
n_tup_ins | 19730476
n_tup_upd | 8567352
n_tup_del | 16143587
n_tup_hot_upd | 0
n_live_tup | 3444573
n_dead_tup | 24748887
n_mod_since_analyze | 547474
last_vacuum |
last_autovacuum | 2017-12-08 16:31:10.820459+08
last_analyze |
last_autoanalyze | 2017-12-08 16:35:16.75293+08
vacuum_count | 0
autovacuum_count | 1
analyze_count | 0
autoanalyze_count | 124
資料清理排程
由於是資料質量監控,所以並不需要保留所有資料,我們通過以下方法,可以高效的清除資料,不影響寫入和讀取。
《如何根據行號高效率的清除過期資料 – 非分割槽表,資料老化實踐》
單例項,每秒的清除速度約263萬行。
如何清除統計資訊
postgres=# select pg_stat_reset_single_table_counters(`test`::regclass);
如何強制手工收集統計資訊
postgres=# analyze verbose test;
INFO: analyzing "public.test"
INFO: "test": scanned 30000 of 238163 pages, containing 560241 live rows and 4294214 dead rows; 30000 rows in sample, 4319958 estimated total rows
ANALYZE
定製化,業務資料質量實時監控
使用閱後即焚的方法,實時監測資料質量。
例子:
《HTAP資料庫 PostgreSQL 場景與效能測試之 32 – (OLTP) 高吞吐資料進出(堆存、行掃、無需索引) – 閱後即焚(JSON + 函式流式計算)》
《HTAP資料庫 PostgreSQL 場景與效能測試之 31 – (OLTP) 高吞吐資料進出(堆存、行掃、無需索引) – 閱後即焚(讀寫大吞吐並測)》
《HTAP資料庫 PostgreSQL 場景與效能測試之 27 – (OLTP) 物聯網 – FEED日誌, 流式處理 與 閱後即焚 (CTE)》
《PostgreSQL 非同步訊息實踐 – Feed系統實時監測與響應(如 電商主動服務) – 分鐘級到毫秒級的實現》
資料清理排程
由於是資料質量監控,所以並不需要保留所有資料,我們通過以下方法,可以高效的清除資料,不影響寫入和讀取。
《如何根據行號高效率的清除過期資料 – 非分割槽表,資料老化實踐》
單例項,每秒的清除速度約263萬行。
參考
《如何根據行號高效率的清除過期資料 – 非分割槽表,資料老化實踐》
《PostgreSQL 統計資訊pg_statistic格式及匯入匯出dump_stat – 相容Oracle》
《PostgreSQL pg_stat_ pg_statio_ 統計資訊(scan,read,fetch,hit)原始碼解讀》
相關文章
- 測試右移:線上質量監控 ELK 實戰
- 工廠生產資料實時分析,產品質量高效管控
- Oracle輕量級實時監控工具-oratopOracle
- PostgreSQL實時健康監控大屏-低頻指標SQL指標
- 宜人蜂巢專案質量管控體系實踐
- 得物技術 NOC—SLA C 端業務監控實踐
- 程式實時監控
- 極光筆記丨資料質量建設實踐筆記
- 傅一平:資料質量管理的實踐和思考
- vivo資料中心網路鏈路質量監測的探索實踐
- Spark+Kafka實時監控Oracle資料預警SparkKafkaOracle
- 美團綜合業務推薦系統的質量模型及實踐模型
- Kubernetes監控實踐
- 前端異常監控實踐前端
- RunLoop實戰:實時卡頓監控OOP
- 質量監控-卡頓檢測
- 實時檔案監控
- iOS實時卡頓監控iOS
- PostgreSQL技術週刊第12期:PostgreSQL時空資料排程實踐SQL
- EMQX+Prometheus+Grafana:MQTT 資料視覺化監控實踐MQPrometheusGrafanaQT視覺化
- 前端監控穩定性資料分析實踐 | 得物技術前端
- 前端監控穩定性資料分析實踐|得物技術前端
- GO實現資料夾監控Go
- 螞蟻金服資料質量治理架構與實踐架構
- 京東物流實時風控實踐
- 馬蜂窩大交通業務質量體系建設初步實踐
- 質量運營在智慧支付業務測試中的初步實踐
- Mysql資料實時同步實踐MySql
- 使用Prometheus、Grafana監控Artifactory實踐PrometheusGrafana
- 360容器平臺監控實踐
- prometheus監控golang服務實踐PrometheusGolang
- Zabbix監控系統深度實踐
- PostgreSQL實時健康監控大屏-高頻指標(伺服器)SQL指標伺服器
- B站的資料質量管理——理論大綱與實踐
- 監控Data Guard實時同步
- 實時監控網站安全網站
- 實時監控log檔案
- Netflix如何使用Druid進行業務質量實時分析UI行業