時空資料庫實踐(含紐約TAXI資料透視分析)-PostGIS+TimescaleDB=>PostgreSQL
標籤
PostgreSQL , 時空資料庫 , 時序資料庫 , 空間資料庫 , timescaleDB , PostGIS
背景
現實社會中,很多業務產生的資料具有時序資料屬性(在時間維度上順序寫入,同時包括大量時間區間查詢統計的需求)。
例如業務的FEED資料,物聯網產生的時序資料(如氣象感測器、車輛軌跡、等),金融行業的實時資料等等。
PostgreSQL的UDF和BRIN(塊級索引)很適合時序資料的處理。具體有以下的兩個例子
《PostgreSQL 按需切片的實現(TimescaleDB外掛自動切片功能的plpgsql schemaless實現)》
《PostgreSQL 時序最佳實踐 – 證券交易系統資料庫設計 – 阿里雲RDS PostgreSQL最佳實踐》
實際上PostgreSQL生態中,衍生了一個時序外掛:timescaleDB。專門用於處理時序資料。(timescale的改進,包括SQL優化器的改進(支援merge append,時間片聚合非常高效),rotate介面,自動分片等)
同時timescaleDB也非常受投資者的關注,已獲5000萬美金的投資,也間接說明時序資料庫在未來是非常受使用者歡迎的。
timescaleDB的優勢
首先,timescaleDB是自動切片的,對使用者無感知,在資料量非常龐大的時候,寫入效能不衰減。(主要指IOPS較低的磁碟,如果IOPS較好的磁碟PG在寫入大量資料後效能也是OK的。)
其次,timescale改進了SQL優化器,增加了merge append的執行節點,同時在對小時間片進行group by時,可以不用HASH或GROUP整個資料範圍,而是分片計算,使得效率非常高。
最後,timescale增加了一些API,使得使用者在時序資料的寫入、維護、查詢都非常的高效、同時易於維護。
API如下
http://docs.timescale.com/v0.8/api
部署timescaleDB
以CentOS 7.x x64為例。
1、首先要安裝好PostgreSQL
參考 《PostgreSQL on Linux 最佳部署手冊》
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql10 --with-segsize=8 --with-wal-segsize=256
LIBS=-lpthread CFLAGS="-O3" make world -j 64
LIBS=-lpthread CFLAGS="-O3" make install-world
2、其次需要安裝cmake3
epel
yum install -y cmake3
ln -s /usr/bin/cmake3 /usr/bin/cmake
3、編譯timescaleDB
git clone https://github.com/timescale/timescaledb/
cd timescaledb
git checkout release-0.8.0
或
wget https://github.com/timescale/timescaledb/archive/0.8.0.tar.gz
export PATH=/home/digoal/pgsql10/bin:$PATH
export LD_LIBRARY_PATH=/home/digoal/pgsql10/lib:$LD_LIBRARY_PATH
# Bootstrap the build system
./bootstrap
cd ./build && make
make install
[ 2%] Built target sqlupdatefile
[ 4%] Built target sqlfile
[100%] Built target timescaledb
Install the project...
-- Install configuration: "Release"
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb.control
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.8.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.7.1--0.8.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.1.0--0.2.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.2.0--0.3.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.3.0--0.4.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.0--0.4.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.1--0.4.2.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.2--0.5.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.5.0--0.6.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.0--0.6.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.1--0.7.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.1--0.7.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.7.0--0.7.1.sql
-- Installing: /home/dege.zzz/pgsql10/lib/timescaledb.so
4、配置postgresql.conf,在資料庫啟動時自動載入timescale lib庫。
vi $PGDATA/postgresql.conf
shared_preload_libraries = `timescaledb`
pg_ctl restart -m fast
5、對需要使用timescaledb的資料庫,建立外掛.
psql
psql (10.1)
Type "help" for help.
postgres=# create extension timescaledb ;
6、timescaledb的相關引數
timescaledb.constraint_aware_append
timescaledb.disable_optimizations
timescaledb.optimize_non_hypertables
timescaledb.restoring
postgres=# show timescaledb.constraint_aware_append ;
timescaledb.constraint_aware_append
-------------------------------------
on
(1 row)
postgres=# show timescaledb.disable_optimizations ;
timescaledb.disable_optimizations
-----------------------------------
off
(1 row)
postgres=# show timescaledb.optimize_non_hypertables ;
timescaledb.optimize_non_hypertables
--------------------------------------
off
(1 row)
postgres=# show timescaledb.restoring ;
timescaledb.restoring
-----------------------
off
(1 row)
timescaleDB使用例子1 – 紐約TAXI資料透視分析
第一個例子是real-life New York City taxicab data ,
http://docs.timescale.com/v0.8/tutorials/tutorial-hello-nyc
資料為真實的資料,來自
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
1、下載樣本資料
wget https://timescaledata.blob.core.windows.net/datasets/nyc_data.tar.gz
2、解壓
tar -zxvf nyc_data.tar.gz
3、建表,其中包括將普通錶轉換為時序儲存表的API create_hypertable 的使用。
psql -f nyc_data.sql
擷取一些nyc_data.sql的內容如下:
cat nyc_data.sql
-- 叫車資料: 包括時長、計費、路程、上車、下車經緯度、時間、人數等等。
CREATE TABLE "rides"(
vendor_id TEXT,
pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
passenger_count NUMERIC,
trip_distance NUMERIC,
pickup_longitude NUMERIC,
pickup_latitude NUMERIC,
rate_code INTEGER,
dropoff_longitude NUMERIC,
dropoff_latitude NUMERIC,
payment_type INTEGER,
fare_amount NUMERIC,
extra NUMERIC,
mta_tax NUMERIC,
tip_amount NUMERIC,
tolls_amount NUMERIC,
improvement_surcharge NUMERIC,
total_amount NUMERIC
);
這句話,將rides轉換為時序表儲存
SELECT create_hypertable(`rides`, `pickup_datetime`, `payment_type`, 2, create_default_indexes=>FALSE);
建立索引
CREATE INDEX ON rides (vendor_id, pickup_datetime desc);
CREATE INDEX ON rides (pickup_datetime desc, vendor_id);
CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
CREATE INDEX ON rides (passenger_count, pickup_datetime desc);
4、匯入測試資料
psql -c "COPY rides FROM nyc_data_rides.csv CSV"
COPY 10906858
5、對已轉換為時序儲存表的rides執行一些測試SQL,效能比PostgreSQL普通表要好。
每天同車超過2人的交易,平均計費多少?
-- Average fare amount of rides with 2+ passengers by day
SELECT date_trunc(`day`, pickup_datetime) as day, avg(fare_amount)
FROM rides
WHERE passenger_count > 1 AND pickup_datetime < `2016-01-08`
GROUP BY day ORDER BY day;
day | avg
--------------------+---------------------
2016-01-01 00:00:00 | 13.3990821679715529
2016-01-02 00:00:00 | 13.0224687415181399
2016-01-03 00:00:00 | 13.5382068607068607
2016-01-04 00:00:00 | 12.9618895561740149
2016-01-05 00:00:00 | 12.6614611935518309
2016-01-06 00:00:00 | 12.5775245695086098
2016-01-07 00:00:00 | 12.5868802584437019
(7 rows)
6、某些查詢的效能甚至超過20倍
每天有多少筆交易。
-- Total number of rides by day for first 5 days
SELECT date_trunc(`day`, pickup_datetime) as day, COUNT(*) FROM rides
GROUP BY day ORDER BY day
LIMIT 5;
day | count
--------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
(5 rows)
timescale增加了merge append的執行優化,因此在時間片上按小粒度聚合,效率非常高,資料量越大,效能提升的效果越明顯。
For example, TimescaleDB introduces a time-based “merge append” optimization to minimize the number of
groups which must be processed to execute the following (given its knowledge that time is already ordered).
For our 100M row table, this results in query latency that is 396x faster than PostgreSQL (82ms vs. 32566ms).
SELECT date_trunc(`minute`, time) AS minute, max(usage_user)
FROM cpu
WHERE time < `2017-01-01`
GROUP BY minute
ORDER BY minute DESC
LIMIT 5;
7、執行一些timescaleDB特有的功能函式,例如time_bucket,這裡同樣會用到timescaleDB內建的一些加速演算法。
每5分鐘間隔為一個BUCKET,輸出每個間隔產生了多少筆訂單。
-- Number of rides by 5 minute intervals
-- (using the TimescaleDB "time_bucket" function)
SELECT time_bucket(`5 minute`, pickup_datetime) as five_min, count(*)
FROM rides
WHERE pickup_datetime < `2016-01-01 02:00`
GROUP BY five_min ORDER BY five_min;
five_min | count
---------------------+-------
2016-01-01 00:00:00 | 703
2016-01-01 00:05:00 | 1482
2016-01-01 00:10:00 | 1959
2016-01-01 00:15:00 | 2200
2016-01-01 00:20:00 | 2285
2016-01-01 00:25:00 | 2291
2016-01-01 00:30:00 | 2349
2016-01-01 00:35:00 | 2328
2016-01-01 00:40:00 | 2440
2016-01-01 00:45:00 | 2372
2016-01-01 00:50:00 | 2388
2016-01-01 00:55:00 | 2473
2016-01-01 01:00:00 | 2395
2016-01-01 01:05:00 | 2510
2016-01-01 01:10:00 | 2412
2016-01-01 01:15:00 | 2482
2016-01-01 01:20:00 | 2428
2016-01-01 01:25:00 | 2433
2016-01-01 01:30:00 | 2337
2016-01-01 01:35:00 | 2366
2016-01-01 01:40:00 | 2325
2016-01-01 01:45:00 | 2257
2016-01-01 01:50:00 | 2316
2016-01-01 01:55:00 | 2250
(24 rows)
8、執行一些統計分析SQL
每個城市的叫車交易量。
-- Join rides with rates to get more information on rate_code
SELECT rates.description, COUNT(vendor_id) as num_trips FROM rides
JOIN rates on rides.rate_code = rates.rate_code
WHERE pickup_datetime < `2016-01-08`
GROUP BY rates.description ORDER BY rates.description;
description | num_trips
-----------------------+-----------
JFK | 54832
Nassau or Westchester | 967
Newark | 4126
group ride | 17
negotiated fare | 7193
standard rate | 2266401
(6 rows)
某些城市2016年1月的叫車統計(最長、短距離、平均人數、時長等)
-- Analysis of all JFK and EWR rides in Jan 2016
SELECT rates.description, COUNT(vendor_id) as num_trips,
AVG(dropoff_datetime - pickup_datetime) as avg_trip_duration, AVG(total_amount) as avg_total,
AVG(tip_amount) as avg_tip, MIN(trip_distance) as min_distance, AVG(trip_distance) as avg_distance, MAX(trip_distance) as max_distance,
AVG(passenger_count) as avg_passengers
FROM rides
JOIN rates on rides.rate_code = rates.rate_code
WHERE rides.rate_code in (2,3) AND pickup_datetime < `2016-02-01`
GROUP BY rates.description ORDER BY rates.description;
description | num_trips | avg_trip_duration | avg_total | avg_tip | min_distance | avg_distance | max_distance | avg_passengers
-------------+-----------+-------------------+---------------------+--------------------+--------------+---------------------+--------------+--------------------
JFK | 225019 | 00:45:46.822517 | 64.3278115181384683 | 7.3334228220728027 | 0.00 | 17.2602816651038357 | 221.00 | 1.7333869584346211
Newark | 16822 | 00:35:16.157472 | 86.4633688027582927 | 9.5461657353465700 | 0.00 | 16.2706122934252764 | 177.23 | 1.7435501129473309
(2 rows)
9、資料自動分片與執行計劃
postgres=# d+ rides
Table "public.rides"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
vendor_id | text | | | | extended | |
pickup_datetime | timestamp without time zone | | not null | | plain | |
dropoff_datetime | timestamp without time zone | | not null | | plain | |
passenger_count | numeric | | | | main | |
trip_distance | numeric | | | | main | |
pickup_longitude | numeric | | | | main | |
pickup_latitude | numeric | | | | main | |
rate_code | integer | | | | plain | |
dropoff_longitude | numeric | | | | main | |
dropoff_latitude | numeric | | | | main | |
payment_type | integer | | | | plain | |
fare_amount | numeric | | | | main | |
extra | numeric | | | | main | |
mta_tax | numeric | | | | main | |
tip_amount | numeric | | | | main | |
tolls_amount | numeric | | | | main | |
improvement_surcharge | numeric | | | | main | |
total_amount | numeric | | | | main | |
Indexes:
"rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)
"rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)
"rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)
"rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)
Child tables: _timescaledb_internal._hyper_1_1_chunk,
_timescaledb_internal._hyper_1_2_chunk,
_timescaledb_internal._hyper_1_3_chunk,
_timescaledb_internal._hyper_1_4_chunk
其中一個分片的約束如下
Check constraints:
"constraint_1" CHECK (pickup_datetime >= `2015-12-31 00:00:00`::timestamp without time zone AND pickup_datetime < `2016-01-30 00:00:00`::timestamp without time zone)
"constraint_2" CHECK (_timescaledb_internal.get_partition_hash(payment_type) >= 1073741823)
Inherits: rides
-- Peek behind the scenes
postgres=# select count(*) from rides;
count
----------
10906858
(1 row)
Time: 376.247 ms
postgres=# explain select count(*) from rides;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=254662.23..254662.24 rows=1 width=8)
-> Gather (cost=254661.71..254662.22 rows=5 width=8)
Workers Planned: 5
-> Partial Aggregate (cost=253661.71..253661.72 rows=1 width=8)
-> Append (cost=0.00..247468.57 rows=2477258 width=0)
-> Parallel Seq Scan on rides (cost=0.00..0.00 rows=1 width=0)
-> Parallel Seq Scan on _hyper_1_1_chunk (cost=0.00..77989.57 rows=863657 width=0)
-> Parallel Seq Scan on _hyper_1_2_chunk (cost=0.00..150399.01 rows=1331101 width=0)
-> Parallel Seq Scan on _hyper_1_3_chunk (cost=0.00..6549.75 rows=112675 width=0)
-> Parallel Seq Scan on _hyper_1_4_chunk (cost=0.00..12530.24 rows=169824 width=0)
(10 rows)
10、也可以直接查分片
postgres=# select count(*) from _timescaledb_internal._hyper_1_1_chunk;
count
---------
3454961
(1 row)
分片對使用者完全透明
分片後設資料:
postgres=# dn
List of schemas
Name | Owner
-----------------------+----------
_timescaledb_cache | postgres
_timescaledb_catalog | postgres
_timescaledb_internal | postgres
public | postgres
(4 rows)
timescaleDB + PostGIS 雙劍合璧 – 時空資料庫
結合時序資料庫timescaleDB外掛,空間資料庫PostGIS外掛。PostgreSQL可以很好的處理空間資料。
1、建立空間資料庫PostGIS建立
create extension postgis;
2、新增空間型別欄位
http://postgis.net/docs/manual-2.4/AddGeometryColumn.html
postgres=# SELECT AddGeometryColumn (`public`,`rides`,`pickup_geom`,2163,`POINT`,2);
addgeometrycolumn
--------------------------------------------------------
public.rides.pickup_geom SRID:2163 TYPE:POINT DIMS:2
(1 row)
postgres=# SELECT AddGeometryColumn (`public`,`rides`,`dropoff_geom`,2163,`POINT`,2);
addgeometrycolumn
---------------------------------------------------------
public.rides.dropoff_geom SRID:2163 TYPE:POINT DIMS:2
(1 row)
postgres=#
postgres=# d+ rides
Table "public.rides"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
vendor_id | text | | | | extended | |
pickup_datetime | timestamp without time zone | | not null | | plain | |
dropoff_datetime | timestamp without time zone | | not null | | plain | |
passenger_count | numeric | | | | main | |
trip_distance | numeric | | | | main | |
pickup_longitude | numeric | | | | main | |
pickup_latitude | numeric | | | | main | |
rate_code | integer | | | | plain | |
dropoff_longitude | numeric | | | | main | |
dropoff_latitude | numeric | | | | main | |
payment_type | integer | | | | plain | |
fare_amount | numeric | | | | main | |
extra | numeric | | | | main | |
mta_tax | numeric | | | | main | |
tip_amount | numeric | | | | main | |
tolls_amount | numeric | | | | main | |
improvement_surcharge | numeric | | | | main | |
total_amount | numeric | | | | main | |
pickup_geom | geometry(Point,2163) | | | | main | |
dropoff_geom | geometry(Point,2163) | | | | main | |
Indexes:
"rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)
"rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)
"rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)
"rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)
Child tables: _timescaledb_internal._hyper_1_1_chunk,
_timescaledb_internal._hyper_1_2_chunk,
_timescaledb_internal._hyper_1_3_chunk,
_timescaledb_internal._hyper_1_4_chunk
3、將資料更新到geometry欄位(實際儲存為兩個自動,分別表示經度和緯度。實際上不更新也沒關係,因為PG支援表示式索引,完全可以使用這兩個欄位,建立表示式空間索引)。
-- Generate the geometry points and write to table
-- (Note: These calculations might take a few mins)
UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163);
UPDATE rides SET dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);
vacuum full rides;
4、時空分析舉例。
在(lat, long) (40.7589,-73.9851)附近400米範圍內,每30分鐘有多少輛車被叫(以上車位置來計算)。
-- Number of rides on New Years Eve originating within
-- 400m of Times Square, by 30 min buckets
-- Note: Times Square is at (lat, long) (40.7589,-73.9851)
SELECT time_bucket(`30 minutes`, pickup_datetime) AS thirty_min, COUNT(*) AS near_times_sq
FROM rides
WHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400
AND pickup_datetime < `2016-01-01 14:00`
GROUP BY thirty_min ORDER BY thirty_min;
thirty_min | near_times_sq
---------------------+--------------
2016-01-01 00:00:00 | 74
2016-01-01 00:30:00 | 102
2016-01-01 01:00:00 | 120
2016-01-01 01:30:00 | 98
2016-01-01 02:00:00 | 112
2016-01-01 02:30:00 | 109
2016-01-01 03:00:00 | 163
2016-01-01 03:30:00 | 181
2016-01-01 04:00:00 | 214
2016-01-01 04:30:00 | 185
2016-01-01 05:00:00 | 158
2016-01-01 05:30:00 | 113
2016-01-01 06:00:00 | 102
2016-01-01 06:30:00 | 91
2016-01-01 07:00:00 | 88
2016-01-01 07:30:00 | 58
2016-01-01 08:00:00 | 72
2016-01-01 08:30:00 | 94
2016-01-01 09:00:00 | 115
2016-01-01 09:30:00 | 118
2016-01-01 10:00:00 | 135
2016-01-01 10:30:00 | 160
2016-01-01 11:00:00 | 212
2016-01-01 11:30:00 | 229
2016-01-01 12:00:00 | 244
2016-01-01 12:30:00 | 230
2016-01-01 13:00:00 | 235
2016-01-01 13:30:00 | 238
例項2 – 感測器資料、天氣資料
http://docs.timescale.com/v0.8/tutorials/other-sample-datasets
不再贅述。
timescaleDB 常用API
http://docs.timescale.com/v0.8/api
1、建立時序表
create_hypertable()
Required Arguments
Name | Description |
---|---|
main_table | Identifier of table to convert to hypertable |
time_column_name | Name of the column containing time values |
Optional Arguments
Name | Description |
---|---|
partitioning_column | Name of an additional column to partition by. If provided, number_partitions must be set. |
number_partitions | Number of hash partitions to use for partitioning_column when this optional argument is supplied. Must be > 0. |
chunk_time_interval | Interval in event time that each chunk covers. Must be > 0. Default is 1 month. |
create_default_indexes | Boolean whether to create default indexes on time/partitioning columns. Default is TRUE. |
if_not_exists | Boolean whether to print warning if table already converted to hypertable or raise exception. Default is FALSE. |
partitioning_func | The function to use for calculating a value`s partition. |
2、新增多級分片欄位
支援hash和interval分片
add_dimension()
Required Arguments
Name | Description |
---|---|
main_table | Identifier of hypertable to add the dimension to. |
column_name | Name of the column to partition by. |
Optional Arguments
Name | Description |
---|---|
number_partitions | Number of hash partitions to use on column_name. Must be > 0. |
interval_length | Interval that each chunk covers. Must be > 0. |
partitioning_func | The function to use for calculating a value`s partition (see create_hypertable instructions). |
3、刪除分片
刪除指定 時間點、多久 之前的分片
drop_chunks()
Required Arguments
Name | Description |
---|---|
older_than | Timestamp of cut-off point for data to be dropped, i.e., anything older than this should be removed. |
Optional Arguments
Name | Description |
---|---|
table_name | Hypertable name from which to drop chunks. If not supplied, all hypertables are affected. |
schema_name | Schema name of the hypertable from which to drop chunks. Defaults to public. |
cascade | Boolean on whether to CASCADE the drop on chunks, therefore removing dependent objects on chunks to be removed. Defaults to FALSE. |
4、設定分片時間區間
set_chunk_time_interval()
Required Arguments
Name | Description |
---|---|
main_table | Identifier of hypertable to update interval for. |
chunk_time_interval | Interval in event time that each new chunk covers. Must be > 0. |
5、分析函式 – 第一條
first()
Required Arguments
Name | Description |
---|---|
value | The value to return (anyelement) |
time | The timestamp to use for comparison (TIMESTAMP/TIMESTAMPTZ or integer type) |
例如,查詢所有感測器的最早上傳的溫度值。
SELECT device_id, first(temp, time)
FROM metrics
GROUP BY device_id;
使用遞迴亦可實現:
《PostgrSQL 遞迴SQL的幾個應用 – 極客與正常人的思維》
6、分析函式 – 最後一條
last()
Required Arguments
Name | Description |
---|---|
value | The value to return (anyelement) |
time | The timestamp to use for comparison (TIMESTAMP/TIMESTAMPTZ or integer type) |
例如,查詢每5分鐘時間區間內,每個感測器的最新溫度值
SELECT device_id, time_bucket(`5 minutes`, time) as interval,
last(temp, time)
FROM metrics
WHERE time > now () - interval `1 day`
GROUP BY device_id, interval
ORDER BY interval DESC;
使用遞迴亦可實現:
《PostgrSQL 遞迴SQL的幾個應用 – 極客與正常人的思維》
7、分析函式 – 柱狀圖
histogram()
Required Arguments
Name | Description |
---|---|
value | A set of values to partition into a histogram |
min | The histogram’s lower bound used in bucketing |
max | The histogram’s upper bound used in bucketing |
nbuckets | The integer value for the number of histogram buckets (partitions) |
例如,
電池電量20到60,均分為5個BUCKET區間,返回5+2個值的陣列(表示每個bucket區間的記錄數),頭尾分為別為邊界外的記錄數有多少。
SELECT device_id, histogram(battery_level, 20, 60, 5)
FROM readings
GROUP BY device_id
LIMIT 10;
device_id | histogram
------------+------------------------------
demo000000 | {0,0,0,7,215,206,572}
demo000001 | {0,12,173,112,99,145,459}
demo000002 | {0,0,187,167,68,229,349}
demo000003 | {197,209,127,221,106,112,28}
demo000004 | {0,0,0,0,0,39,961}
demo000005 | {12,225,171,122,233,80,157}
demo000006 | {0,78,176,170,8,40,528}
demo000007 | {0,0,0,126,239,245,390}
demo000008 | {0,0,311,345,116,228,0}
demo000009 | {295,92,105,50,8,8,442}
8、分析函式 – 時間區間
類似date_trunc,但是更強大,可以用任意interval進行時間截斷。方便使用者使用。
time_bucket()
Required Arguments
Name | Description |
---|---|
bucket_width | A PostgreSQL time interval for how long each bucket is (interval) |
time | The timestamp to bucket (timestamp/timestamptz/date) |
Optional Arguments
Name | Description |
---|---|
offset | The time interval to offset all buckets by (interval) |
9、資料概貌檢視函式 – 時序表概貌
hypertable_relation_size_pretty()
SELECT * FROM hypertable_relation_size_pretty(`conditions`);
table_size | index_size | toast_size | total_size
------------+------------+------------+------------
1171 MB | 1608 MB | 176 kB | 2779 MB
10、資料概貌檢視函式 – 分片大小
chunk_relation_size_pretty()
SELECT * FROM chunk_relation_size_pretty(`conditions`);
chunk_table | table_size | index_size | total_size
---------------------------------------------+------------+------------+------------
"_timescaledb_internal"."_hyper_1_1_chunk" | 28 MB | 36 MB | 64 MB
"_timescaledb_internal"."_hyper_1_2_chunk" | 57 MB | 78 MB | 134 MB
...
11、資料概貌檢視函式 – 索引大小
indexes_relation_size_pretty()
SELECT * FROM indexes_relation_size_pretty(`conditions`);
index_name_ | total_size
--------------------------------------+------------
public.conditions_device_id_time_idx | 1143 MB
public.conditions_time_idx | 465 MB
12、匯出時序後設資料
https://raw.githubusercontent.com/timescale/timescaledb/master/scripts/dump_meta_data.sql
psql [your connect flags] -d your_timescale_db < dump_meta_data.sql > dumpfile.txt
小結
timescaleDB是一個非常好用的時序資料處理外掛,隱藏了分片邏輯(對使用者透明),同時提供了大量的API函式介面,以及效能優化。在時序場景使用很贊。
結合PostGIS外掛,PostgreSQL在時空處理這塊,如虎添翼。
參考
《PostgreSQL 按需切片的實現(TimescaleDB外掛自動切片功能的plpgsql schemaless實現)》
《時序資料庫有哪些特點? TimescaleDB時序資料庫介紹》
http://docs.timescale.com/v0.8/tutorials/tutorial-hello-nyc
http://docs.timescale.com/v0.8/introduction
《PostgrSQL 遞迴SQL的幾個應用 – 極客與正常人的思維》
相關文章
- PostgreSQL技術週刊第12期:PostgreSQL時空資料排程實踐SQL
- 奔跑吧,大屏-時間+空間實時四維資料透視
- 第8章 紐約計程車軌跡的空間和時間資料分析
- PostgreSQL:表空間-->資料庫-->表SQL資料庫
- PostgreSQL業務資料質量實時監控實踐SQL
- Excel資料分析入門-資料透視表Excel
- 【安全告警資料分析之道:一】資料透視篇
- 資料庫實踐資料庫
- 實時資料庫與時序資料庫資料庫
- 資料庫實踐丨MySQL多表join分析資料庫MySql
- 檢視資料庫表空間資料庫
- PostgreSQL資料庫匯入大量資料時如何最佳化SQL資料庫
- JUST京東城市時空資料引擎2.0架構實踐架構
- 時序資料庫分析-TimescaleDB時序資料庫介紹資料庫
- 資料視覺化實踐視覺化
- 如何透過SQLyog分析MySQL資料庫MySql資料庫
- 資料庫新兵:分散式實時分析記憶體資料庫eSight資料庫分散式記憶體
- 如何透過一條資料庫語句做資料分析資料庫
- 透過shell指令碼檢視資料庫表空間使用情況指令碼資料庫
- Mysql資料實時同步實踐MySql
- python-資料分析-Pandas-4、DataFrame-資料透視Python
- DM資料庫操作實踐資料庫
- 向量資料庫落地實踐資料庫
- Excel資料透視表怎麼做 Excel資料透視表技巧Excel
- .NET雲原生應用實踐(三):連線到PostgreSQL資料庫SQL資料庫
- 基石視覺化資料分析平臺設計實踐視覺化
- 基於雲原生的大資料實時分析方案實踐大資料
- Uber永久定位系統實時資料分析過程實踐!
- postgresql 資料庫基本操作SQL資料庫
- postgresql資料庫利用方式SQL資料庫
- postgresql 資料庫基本管理SQL資料庫
- 資料庫_SQL-PostgreSQL資料庫SQL
- PostgreSQL-資料庫命令SQL資料庫
- 資料庫學習:透過作業定時同步兩個資料庫(轉)資料庫
- PostgreSQL資料庫將迎來崛起時刻HYSQL資料庫
- ETL資料整合丨PostgreSQL資料遷移至Hive資料庫SQLHive資料庫
- PHP最佳實踐之資料庫PHP資料庫
- 資料庫之鎖的實踐資料庫