AQUA(高階查詢加速器)是什麼?
AQUA是一款功能強大的硬體查詢加速器,將配合RA3節點(ra3.4xl或ra3.16xl)與S3託管儲存共同起效。
下面來看亞馬遜雲科技官方博文中的相關描述:
這套儲存系統採用多種提示機制(包括資料塊熱度、資料阻塞與工作負載模式)管理快取,藉以實現更高效能。
除了快取機制之外,AQUA還充分發揮Amazon Nitro System與定製化FPGA加速方案的優勢,在更接近資料的位置處理規約及聚合查詢對應的計算負載。這種方式能夠減少網路流量,削弱RA3節點中CPU的工作負擔,由此將查詢效能提升達10倍。更重要的是,AQUA不產生任何額外費用,也無需更改程式碼內容。AQUA還採用Amazon Simple Storage Service (S3)提供的快速、高頻寬連線資源。
利用快照建立AQUA叢集
這裡,我們將嘗試透過快照還原功能建立AQUA叢集。您可以選擇ra3.4xlarge或ra3.16xlarge節點型別。如果您已經擁有采用這些節點的叢集,則其中的AQUA會被預設配置為“Automatic”。要開始使用AQUA,請選擇[Actions]-[Configure AQUA]並將以下對話方塊中的 Automatic 配置調整為Turn On。
這裡,我們使用預設配置Automatic建立一套叢集。在配置AQUA的過程中,您可以靈活調整以下選項:
- Automatic (預設)
- Redshift確定是否使用AQUA。
- 截至目前,AQUA(高階查詢加速器)使用狀態仍為:尚未啟用AQUA,但情況隨時可能有所變化(“Currently, AQUA isn't activated with this option, but this behavior is subject to change”)。在變動之前,此狀態仍然等效於Turn Off;代表AQUA不會被啟用。
- Turn On
- 您將選擇始終使用AQUA。AQUA僅可在某些亞馬遜雲科技區域以及ra3.4xlarge與ra3.16xlarge節點型別當中啟用。
- Turn Off
- 您選擇不使用AQUA。
等待約5分鐘後,AQUA即轉為Available可用狀態。可以看到,本文中的示例叢集採用AQUA“Automatic”配置進行啟動。
建立測試資料
AQUA在LIKE及SIMILAR TO等操作中的加速效果尤其出色,這裡我們準備了約3億條資料。
dev=> create table lineitem (
dev(> l_orderkey bigint not null,
dev(> l_partkey bigint,
dev(> l_suppkey bigint,
dev(> l_linenumber integer not null,
dev(> l_quantity decimal(18,4),
dev(> l_extendedprice decimal(18,4),
dev(> l_discount decimal(18,4),
dev(> l_tax decimal(18,4),
dev(> l_returnflag varchar(1),
dev(> l_linestatus varchar(1),
dev(> l_shipdate date,
dev(> l_commitdate date,
dev(> l_receiptdate date,
dev(> l_shipinstruct varchar(25),
dev(> l_shipmode varchar(10),
dev(> l_comment varchar(44))
dev-> distkey (l_orderkey)
dev-> sortkey (l_receiptdate);
CREATE TABLE
dev=> copy lineitem from 's3://cm-bucket/redshift-immersionday-labs/data/lineitem-part/'
dev-> iam_role 'arn:aws:iam::123456789012:role/AmazonRedshiftRole'
dev-> region 'ap-northeast-1' gzip delimiter '|' compupdate preset;
INFO: Load into table 'lineitem' completed, 303008217 record(s) loaded successfully.
COPY
dev=> select * from lineitem limit 1;
-[ RECORD 1 ]---+----------------------------------------
l_orderkey | 7428384
l_partkey | 9121341
l_suppkey | 621360
l_linenumber | 4
l_quantity | 23.0000
l_extendedprice | 31323.4700
l_discount | 0.0900
l_tax | 0.0500
l_returnflag | R
l_linestatus | F
l_shipdate | 1992-01-02
l_commitdate | 1992-03-22
l_receiptdate | 1992-01-03
l_shipinstruct | DELIVER IN PERSON
l_shipmode | FOB
l_comment | haggle carefully about the furiously ir
AQUA效能測試
這裡,我們透過顯式更改Turn On/Off進行效能差異對比。要更新AQUA配置,您可以點選[Actions]-[Configure AQUA]。
備註:
在對話方塊中更改Turn ON/Off並點選[Save changes]之後,叢集將立即重新啟動以應用變更。
測試查詢
在測試中,我們執行以下 SIMILAR TO 與 LIKE 示例查詢,並分別記錄其響應時間。
- SIMILAR TO示例查詢
-- explain
select sum(l_orderkey), count(*)
from lineitem
where
l_comment similar to 'slyly %' or
l_comment similar to 'plant %' or
l_comment similar to 'fina %' or
l_comment similar to 'quick %' or
l_comment similar to 'slyly %' or
l_comment similar to 'quickly %' or
l_comment similar to ' %about%' or
l_comment similar to ' final%' or
l_comment similar to ' %final%' or
l_comment similar to ' breach%' or
l_comment similar to ' egular%' or
l_comment similar to ' %closely%' or
l_comment similar to ' closely%' or
l_comment similar to ' %idea%' or
l_comment similar to ' idea%' ;
LIKE示例查詢
-- explain
select sum(l_orderkey), count(*)
from lineitem
where
l_comment like 'slyly %' or
l_comment like 'plant %' or
l_comment like 'fina %' or
l_comment like 'quick %' or
l_comment like 'slyly %' or
l_comment like 'quickly %' or
l_comment like ' %about%' or
l_comment like ' final%' or
l_comment like ' %final%' or
l_comment like ' breach%' or
l_comment like ' egular%' or
l_comment like ' %closely%' or
l_comment like ' closely%' or
l_comment like ' %idea%' or
l_comment like ' idea%' ;
我們在這裡搜尋了包含“l_comment”的字串,並彙總相關記錄。我知道直接用or連線 SIMILAR TO 與 LIKE 的作法不太科學,但我們們應該相信AQUA的能力,對吧?
我們禁用了結果快取以準確衡量處理時長,以下為最終結果。
set enable_result_cache_for_session to off;
- SIMILAR TO效能比較
- AQUA 未啟用 (Turn Off)
dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev-> l_comment similar to 'slyly %' or
dev-> l_comment similar to 'plant %' or
dev-> l_comment similar to 'fina %' or
dev-> l_comment similar to 'quick %' or
dev-> l_comment similar to 'slyly %' or
dev-> l_comment similar to 'quickly %' or
dev-> l_comment similar to ' %about%' or
dev-> l_comment similar to ' final%' or
dev-> l_comment similar to ' %final%' or
dev-> l_comment similar to ' breach%' or
dev-> l_comment similar to ' egular%' or
dev-> l_comment similar to ' %closely%' or
dev-> l_comment similar to ' closely%' or
dev-> l_comment similar to ' %idea%' or
dev-> l_comment similar to ' idea%' ;
sum | count
------------------+---------
1440371216714447 | 9496106
(1 row)
Time: 215896.819 ms
select sum(l_orderkey), count(*)
from lineitem
where
l_comment similar to 'slyly %' or
l_comment similar to 'plant %' or
l_comment similar to 'fina %' or
l_comment similar to 'quick %' or
l_comment similar to 'slyly %' or
l_comment similar to 'quickly %' or
l_comment similar to ' %about%' or
l_comment similar to ' final%' or
l_comment similar to ' %final%' or
l_comment similar to ' breach%' or
l_comment similar to ' egular%' or
l_comment similar to ' %closely%' or
l_comment similar to ' closely%' or
l_comment similar to ' %idea%' or
l_comment similar to ' idea%' ;
sum | count
------------------+---------
1440371216714447 | 9496106
(1 row)
Time: 211313.374 ms
以下為執行計劃。
dev=> explain
select sum(l_orderkey), count(*)
from lineitem
where
l_comment similar to 'slyly %' or
l_comment similar to 'plant %' or
l_comment similar to 'fina %' or
l_comment similar to 'quick %' or
l_comment similar to 'slyly %' or
l_comment similar to 'quickly %' or
l_comment similar to ' %about%' or
l_comment similar to ' final%' or
l_comment similar to ' %final%' or
l_comment similar to ' breach%' or
l_comment similar to ' egular%' or
l_comment similar to ' %closely%' or
l_comment similar to ' closely%' or
l_comment similar to ' %idea%' or
l_comment similar to ' idea%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
XN Aggregate (cost=13830214.62..13830214.62 rows=1 width=8)
-> XN Seq Scan on lineitem (cost=0.00..13635370.08 rows=38968908 width=8)
Filter: (((l_comment)::text ~ '^( .*idea.*)$'::text) OR ((l_comment)::text ~ '^( idea.*)$'::text) OR ((l_comment)::text ~ '^(fina .*)$'::text) OR ((l_comment)::text ~ '^( .*about.*)$'::text) OR ((l_comment)::text ~ '^( .*final.*)$'::text) OR ((l_comment)::text ~ '^( final.*)$'::text) OR ((l_comment)::text ~ '^(plant .*)$'::text) OR ((l_comment)::text ~ '^(quick .*)$'::text) OR ((l_comment)::text ~ '^(slyly .*)$'::text) OR ((l_comment)::text ~ '^( breach.*)$'::text) OR ((l_comment)::text ~ '^( egular.*)$'::text) OR ((l_comment)::text ~ '^( .*closely.*)$'::text) OR ((l_comment)::text ~ '^( closely.*)$'::text) OR ((l_comment)::text ~ '^(quickly .*)$'::text))
(3 rows)
Time: 8.506 ms
AQUA 已啟用 (Turn On)
dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev-> l_comment similar to 'slyly %' or
dev-> l_comment similar to 'plant %' or
dev-> l_comment similar to 'fina %' or
dev-> l_comment similar to 'quick %' or
dev-> l_comment similar to 'slyly %' or
dev-> l_comment similar to 'quickly %' or
dev-> l_comment similar to ' %about%' or
dev-> l_comment similar to ' final%' or
dev-> l_comment similar to ' %final%' or
dev-> l_comment similar to ' breach%' or
dev-> l_comment similar to ' egular%' or
dev-> l_comment similar to ' %closely%' or
dev-> l_comment similar to ' closely%' or
dev-> l_comment similar to ' %idea%' or
dev-> l_comment similar to ' idea%' ;
sum | count
------------------+---------
1440371216714447 | 9496106
(1 row)
Time: 29191.625 ms
dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev-> l_comment similar to 'slyly %' or
dev-> l_comment similar to 'plant %' or
dev-> l_comment similar to 'fina %' or
dev-> l_comment similar to 'quick %' or
dev-> l_comment similar to 'slyly %' or
dev-> l_comment similar to 'quickly %' or
dev-> l_comment similar to ' %about%' or
dev-> l_comment similar to ' final%' or
dev-> l_comment similar to ' %final%' or
dev-> l_comment similar to ' breach%' or
dev-> l_comment similar to ' egular%' or
dev-> l_comment similar to ' %closely%' or
dev-> l_comment similar to ' closely%' or
dev-> l_comment similar to ' %idea%' or
dev-> l_comment similar to ' idea%' ;
sum | count
------------------+---------
1440371216714447 | 9496106
(1 row)
Time: 7512.982 ms
以下為執行計劃。
dev=> explain
select sum(l_orderkey), count(*)
from lineitem
where
l_comment similar to 'slyly %' or
l_comment similar to 'plant %' or
l_comment similar to 'fina %' or
l_comment similar to 'quick %' or
l_comment similar to 'slyly %' or
l_comment similar to 'quickly %' or
l_comment similar to ' %about%' or
l_comment similar to ' final%' or
l_comment similar to ' %final%' or
l_comment similar to ' breach%' or
l_comment similar to ' egular%' or
l_comment similar to ' %closely%' or
l_comment similar to ' closely%' or
l_comment similar to ' %idea%' or
l_comment similar to ' idea%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
XN Aggregate (cost=13830214.62..13830214.62 rows=1 width=8)
-> XN Seq Scan on lineitem (cost=0.00..13635370.08 rows=38968908 width=8)
Filter: (((l_comment)::text ~ '^( .*idea.*)$'::text) OR ((l_comment)::text ~ '^( idea.*)$'::text) OR ((l_comment)::text ~ '^(fina .*)$'::text) OR ((l_comment)::text ~ '^( .*about.*)$'::text) OR ((l_comment)::text ~ '^( .*final.*)$'::text) OR ((l_comment)::text ~ '^( final.*)$'::text) OR ((l_comment)::text ~ '^(plant .*)$'::text) OR ((l_comment)::text ~ '^(quick .*)$'::text) OR ((l_comment)::text ~ '^(slyly .*)$'::text) OR ((l_comment)::text ~ '^( breach.*)$'::text) OR ((l_comment)::text ~ '^( egular.*)$'::text) OR ((l_comment)::text ~ '^( .*closely.*)$'::text) OR ((l_comment)::text ~ '^( closely.*)$'::text) OR ((l_comment)::text ~ '^(quickly .*)$'::text))
(3 rows)
Time: 8.683 ms
測試結果
在AQUA已啟用的情況下,SIMILAR TO 查詢效能得到顯著提升:第一輪測試中提升7.4倍,第二及後續輪次中提升28.1倍。具體查詢計劃與AQUA無關,其中的狀態均轉換為正規表示式。(下表中的時長單位為秒)
- LIKE效能比較
- AQUA未啟用 (Turn Off)
dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev-> l_comment like 'slyly %' or
dev-> l_comment like 'plant %' or
dev-> l_comment like 'fina %' or
dev-> l_comment like 'quick %' or
dev-> l_comment like 'slyly %' or
dev-> l_comment like 'quickly %' or
dev-> l_comment like ' %about%' or
dev-> l_comment like ' final%' or
dev-> l_comment like ' %final%' or
dev-> l_comment like ' breach%' or
dev-> l_comment like ' egular%' or
dev-> l_comment like ' %closely%' or
dev-> l_comment like ' closely%' or
dev-> l_comment like ' %idea%' or
dev-> l_comment like ' idea%' ;
sum | count
------------------+---------
1440371216714447 | 9496106
(1 row)
Time: 10276.394 ms
dev=>
dev=>
dev=> select sum(l_orderkey), count(*)
from lineitem
where
l_comment like 'slyly %' or
l_comment like 'plant %' or
l_comment like 'fina %' or
l_comment like 'quick %' or
l_comment like 'slyly %' or
l_comment like 'quickly %' or
l_comment like ' %about%' or
l_comment like ' final%' or
l_comment like ' %final%' or
l_comment like ' breach%' or
l_comment like ' egular%' or
l_comment like ' %closely%' or
l_comment like ' closely%' or
l_comment like ' %idea%' or
l_comment like ' idea%' ;
sum | count
------------------+---------
1440371216714447 | 9496106
(1 row)
Time: 6921.963 ms
以下為執行計劃。
dev=> explain
select sum(l_orderkey), count(*)
from lineitem
where
l_comment like 'slyly %' or
l_comment like 'plant %' or
l_comment like 'fina %' or
l_comment like 'quick %' or
l_comment like 'slyly %' or
l_comment like 'quickly %' or
l_comment like ' %about%' or
l_comment like ' final%' or
l_comment like ' %final%' or
l_comment like ' breach%' or
l_comment like ' egular%' or
l_comment like ' %closely%' or
l_comment like ' closely%' or
l_comment like ' %idea%' or
l_comment like ' idea%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
XN Aggregate (cost=13688958.11..13688958.11 rows=1 width=8)
-> XN Seq Scan on lineitem (cost=0.00..13635370.08 rows=10717605 width=8)
Filter: (((l_comment)::text ~~ ' %idea%'::text) OR ((l_comment)::text ~~ ' %about%'::text) OR ((l_comment)::text ~~ ' %final%'::text) OR ((l_comment)::text ~~ ' %closely%'::text) OR ((l_comment)::text ~~ ' breach%'::text) OR ((l_comment)::text ~~ ' closely%'::text) OR ((l_comment)::text ~~ ' egular%'::text) OR ((l_comment)::text ~~ ' final%'::text) OR ((l_comment)::text ~~ ' idea%'::text) OR ((l_comment)::text ~~ 'fina %'::text) OR ((l_comment)::text ~~ 'plant %'::text) OR ((l_comment)::text ~~ 'quick %'::text) OR ((l_comment)::text ~~ 'quickly %'::text) OR ((l_comment)::text ~~ 'slyly %'::text))
(3 rows)
Time: 7.985 ms
AQUA已啟用 (Turn On)
dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev-> l_comment like 'slyly %' or
dev-> l_comment like 'plant %' or
dev-> l_comment like 'fina %' or
dev-> l_comment like 'quick %' or
dev-> l_comment like 'slyly %' or
dev-> l_comment like 'quickly %' or
dev-> l_comment like ' %about%' or
dev-> l_comment like ' final%' or
dev-> l_comment like ' %final%' or
dev-> l_comment like ' breach%' or
dev-> l_comment like ' egular%' or
dev-> l_comment like ' %closely%' or
dev-> l_comment like ' closely%' or
dev-> l_comment like ' %idea%' or
dev-> l_comment like ' idea%' ;
sum | count
------------------+---------
1440371216714447 | 9496106
(1 row)
Time: 11116.387 ms
dev=>
dev=> select sum(l_orderkey), count(*)
from lineitem
where
l_comment like 'slyly %' or
l_comment like 'plant %' or
l_comment like 'fina %' or
l_comment like 'quick %' or
l_comment like 'slyly %' or
l_comment like 'quickly %' or
l_comment like ' %about%' or
l_comment like ' final%' or
l_comment like ' %final%' or
l_comment like ' breach%' or
l_comment like ' egular%' or
l_comment like ' %closely%' or
l_comment like ' closely%' or
l_comment like ' %idea%' or
l_comment like ' idea%' ;
sum | count
------------------+---------
1440371216714447 | 9496106
(1 row)
Time: 7526.141 ms
以下為執行計劃。
dev=> explain
select sum(l_orderkey), count(*)
from lineitem
where
l_comment like 'slyly %' or
l_comment like 'plant %' or
l_comment like 'fina %' or
l_comment like 'quick %' or
l_comment like 'slyly %' or
l_comment like 'quickly %' or
l_comment like ' %about%' or
l_comment like ' final%' or
l_comment like ' %final%' or
l_comment like ' breach%' or
l_comment like ' egular%' or
l_comment like ' %closely%' or
l_comment like ' closely%' or
l_comment like ' %idea%' or
l_comment like ' idea%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
XN Aggregate (cost=13688958.11..13688958.11 rows=1 width=8)
-> XN Seq Scan on lineitem (cost=0.00..13635370.08 rows=10717605 width=8)
Filter: (((l_comment)::text ~~ ' %idea%'::text) OR ((l_comment)::text ~~ ' %about%'::text) OR ((l_comment)::text ~~ ' %final%'::text) OR ((l_comment)::text ~~ ' %closely%'::text) OR ((l_comment)::text ~~ ' breach%'::text) OR ((l_comment)::text ~~ ' closely%'::text) OR ((l_comment)::text ~~ ' egular%'::text) OR ((l_comment)::text ~~ ' final%'::text) OR ((l_comment)::text ~~ ' idea%'::text) OR ((l_comment)::text ~~ 'fina %'::text) OR ((l_comment)::text ~~ 'plant %'::text) OR ((l_comment)::text ~~ 'quick %'::text) OR ((l_comment)::text ~~ 'quickly %'::text) OR ((l_comment)::text ~~ 'slyly %'::text))
(3 rows)
Time: 8.096 ms
測試結果
在啟用AQUA之後,LIKE查詢的效能略有下降:第一輪及之後輪次中的效能約為未啟用時的0.9倍。(下表中的時長單位為秒)
討論
透過使用AQUA,SIMILAR TO過濾查詢的效能達到7至28倍的提升,但 LIKE 查詢的執行速度反而有所下降。可以看到,AQUA會造成一定程度的資源開銷。
在本次測試中,儘管我們禁用了結果快取,第一輪查詢與後續輪次當中仍然存在處理時間上的差異。造成這種情況的原因可能包括:
在未啟用AQUA的情況下,首輪查詢當中包含記錄編譯、以及將資料從託管S3載入至本地儲存內的時間。這兩項操作的結果都被納入快取,可供第二輪查詢直接使用。另一方面,啟用AQUA時的首輪查詢當中包含記錄編譯以及將所需資料由託管S3載入至AQUA的時間。載入至AQUA的資料可能也會被納入快取,但目前還沒有關於AQUA快取的正式條件描述或說明,因此我們無法對具體快取量做出準確估算。
AQUA費率標準
完全免費,零成本!!!
總結
透過這些測試,我們證明啟用AQUA之後,SIMILAR TO 過濾查詢的效能可提升7至28倍。
在這裡,我們顯式開啟/關閉AQUA,並配合不同資料進行了全面測試。根據查詢與工作負載的不同,AQUA有時候反而會拉低執行效能。我希望預設設定“Automatic”能快點起效,由Redshift自主判斷是否需要使用AQUA。
測試結果告訴我們,並不是所有工作負載都適合AQUA;因此本文建議大家根據實際用例決定是否啟用AQUA。至少在使用受支援節點型別的叢集當中,AQUA預設設定為“Automatic”。如果後續亞馬遜雲科技開放AQUA自動啟用等功能,那麼即使不刻意調整,我們也能充分享受AQUA帶來的效能增強。請耐心等待,共同期待RedShift與AQUA的協同發展。
如果能想辦法將由託管S3載入至本地儲存的資料快取與編譯快取無效化,我們的測試應該會更為精細,並幫助我們考慮更多現有問題、挑戰與解決辦法。這一點不僅對Redshift非常重要,同時也有望破除雲資料倉儲複雜度越來越高、我們越來越難以理解其運作行為的困境。和大家一樣,我也殷切期待著這樣一套無需理解底層工作原理、就能始終保持良好狀態的業務體系。