oracle之優化一用group by或exists優化distinct
今天mentor給了一個sql語句優化的任務。(環境是sql developer
)有一個語句執行很慢,查詢出來的結果有17544條記錄,但需970秒,速度很慢。語句是這樣的:
SELECT DISTINCT 'AMEND_NEW',
reporttitle,
reportsubtitle,
cab_cab_transactions.branchcode,
cab_cab_transactions.prtfo_cd,
cab_cab_transactions.sstm_scrty_id,
cab_cab_transactions.sstm_trx_id,
cab_cab_transactions.trde_dttm,
cab_cab_transactions.efcte_dttm,
cab_cab_transactions.due_stlmnt_dt,
cab_cab_transactions.cncl_efcte_dttm,
cab_cab_transactions.trde_sstm_id,
cab_cab_transactions.trx_type_cd,
cab_cab_transactions.trx_type_dscrn,
cab_cab_transactions.trx_subtype_cd,
cab_cab_transactions.trde_stat_flg,
cab_cab_transactions.csh_cr_dr_indcr,
cab_cab_transactions.long_shrt_indcr,
cab_cab_transactions.lcl_crncy,
cab_cab_transactions.stlmt_crncy,
cab_cab_transactions.nomin_qty,
cab_cab_transactions.price,
cab_cab_transactions.lcl_cst,
cab_cab_transactions.prtfo_cst,
cab_cab_transactions.lcl_book_cst,
cab_cab_transactions.prtfo_book_cst,
cab_cab_transactions.lcl_sell_prcds,
cab_cab_transactions.prtfo_sell_prcds,
cab_cab_transactions.lcl_gnls,
cab_cab_transactions.prtfo_gnls,
cab_cab_transactions.lcl_acrd_intrt,
cab_cab_transactions.prtfo_acrd_intrt,
cab_cab_transactions.stlmt_crncy_stlmt_amt,
cab_cab_transactions.lcl_net_amt,
cab_cab_transactions.prtfo_net_amt,
cab_cab_transactions.fx_bght_amt,
cab_cab_transactions.fx_sold_amt,
cab_cab_transactions.prtfo_crncy_stlmt_amt,
cab_cab_transactions.prtfo_net_incme,
cab_cab_transactions.dvnd_crncy_net_incme,
cab_cab_transactions.dvnd_type_cd,
cab_cab_transactions.lcl_intrt_pd_rec,
cab_cab_transactions.prtfo_intrt_pd_rec,
cab_cab_transactions.lcl_dvdnd_pd_rec,
cab_cab_transactions.prtfo_dvdnd_pd_rec,
cab_cab_transactions.lcl_sundry_inc_pd_rec,
cab_cab_transactions.prtfo_sundry_inc_pd_rec,
cab_cab_transactions.bnk_csh_cptl_secid,
cab_cab_transactions.bnk_csh_inc_secid,
cab_cab_transactions.reportdate,
cab_cab_transactions.filename,
sysdate,
'e483448'
FROM cab_cfg_trx_type_mapping RIGHT JOIN(cab_cab_tran_adjustments
INNER JOIN cab_cab_transactions ON(cab_cab_transactions.branchcode = cab_cab_tran_adjustments.branchcode )
AND(cab_cab_tran_adjustments.sstm_trx_id = cab_cab_transactions.sstm_trx_id)) ON(cab_cfg_trx_type_mapping.cab_trx_type_cd = cab_cab_transactions.trx_type_cd)
AND(nvl(cab_cfg_trx_type_mapping.cab_trx_subtype_cd,' ') = nvl(cab_cab_transactions.trx_subtype_cd,' ')
AND (cab_cfg_trx_type_mapping.branchcode=cab_cab_transactions.branchcode))
WHERE cab_cab_transactions.prtfo_cd IN
(SELECT DISTINCT prtfo_cd
FROM cab_cab_valuations_working
WHERE created_by = 'e483448'
AND branchcode='ISA')
AND cab_cab_tran_adjustments.efcte_dttm > '2011-07-31'
AND cab_cab_tran_adjustments.efcte_dttm <= '2011-08-31'
AND eff_trde_stat_flg <> 'X'
AND cab_cab_transactions.branchcode = 'ISA'
AND cab_cab_tran_adjustments.branchcode = 'ISA'
AND(cab_cfg_trx_type_mapping.cab_reportgroup = 'CABValuation' OR cab_cfg_trx_type_mapping.cab_reportgroup IS NULL)
問題在distinct
上面,它會導致對全表掃描,而且會導致排序,然後刪除重複的記錄,所以速度很慢,因此需要優化distinct
。查了不少資料,並逐一嘗試,最後發現了一個非常可觀的優化結果,用group by
。語句如下:
SELECT 'AMEND_NEW',
reporttitle,
reportsubtitle,
cab_cab_transactions.branchcode,
cab_cab_transactions.prtfo_cd,
cab_cab_transactions.sstm_scrty_id,
cab_cab_transactions.sstm_trx_id,
cab_cab_transactions.trde_dttm,
cab_cab_transactions.efcte_dttm,
cab_cab_transactions.due_stlmnt_dt,
cab_cab_transactions.cncl_efcte_dttm,
cab_cab_transactions.trde_sstm_id,
cab_cab_transactions.trx_type_cd,
cab_cab_transactions.trx_type_dscrn,
cab_cab_transactions.trx_subtype_cd,
cab_cab_transactions.trde_stat_flg,
cab_cab_transactions.csh_cr_dr_indcr,
cab_cab_transactions.long_shrt_indcr,
cab_cab_transactions.lcl_crncy,
cab_cab_transactions.stlmt_crncy,
cab_cab_transactions.nomin_qty,
cab_cab_transactions.price,
cab_cab_transactions.lcl_cst,
cab_cab_transactions.prtfo_cst,
cab_cab_transactions.lcl_book_cst,
cab_cab_transactions.prtfo_book_cst,
cab_cab_transactions.lcl_sell_prcds,
cab_cab_transactions.prtfo_sell_prcds,
cab_cab_transactions.lcl_gnls,
cab_cab_transactions.prtfo_gnls,
cab_cab_transactions.lcl_acrd_intrt,
cab_cab_transactions.prtfo_acrd_intrt,
cab_cab_transactions.stlmt_crncy_stlmt_amt,
cab_cab_transactions.lcl_net_amt,
cab_cab_transactions.prtfo_net_amt,
cab_cab_transactions.fx_bght_amt,
cab_cab_transactions.fx_sold_amt,
cab_cab_transactions.prtfo_crncy_stlmt_amt,
cab_cab_transactions.prtfo_net_incme,
cab_cab_transactions.dvnd_crncy_net_incme,
cab_cab_transactions.dvnd_type_cd,
cab_cab_transactions.lcl_intrt_pd_rec,
cab_cab_transactions.prtfo_intrt_pd_rec,
cab_cab_transactions.lcl_dvdnd_pd_rec,
cab_cab_transactions.prtfo_dvdnd_pd_rec,
cab_cab_transactions.lcl_sundry_inc_pd_rec,
cab_cab_transactions.prtfo_sundry_inc_pd_rec,
cab_cab_transactions.bnk_csh_cptl_secid,
cab_cab_transactions.bnk_csh_inc_secid,
cab_cab_transactions.reportdate,
cab_cab_transactions.filename,
sysdate,
'e483448'
FROM cab_cfg_trx_type_mapping RIGHT JOIN(cab_cab_tran_adjustments
INNER JOIN cab_cab_transactions ON(cab_cab_transactions.branchcode = cab_cab_tran_adjustments.branchcode )
AND(cab_cab_tran_adjustments.sstm_trx_id = cab_cab_transactions.sstm_trx_id)) ON(cab_cfg_trx_type_mapping.cab_trx_type_cd = cab_cab_transactions.trx_type_cd)
AND(nvl(cab_cfg_trx_type_mapping.cab_trx_subtype_cd,' ') = nvl(cab_cab_transactions.trx_subtype_cd,' ')
AND (cab_cfg_trx_type_mapping.branchcode=cab_cab_transactions.branchcode))
WHERE cab_cab_transactions.prtfo_cd IN
(SELECT DISTINCT prtfo_cd
FROM cab_cab_valuations_working
WHERE created_by = 'e483448'
AND branchcode='ISA')
AND cab_cab_tran_adjustments.efcte_dttm > '2011-07-31'
AND cab_cab_tran_adjustments.efcte_dttm <= '2011-08-31'
AND eff_trde_stat_flg <> 'X'
AND cab_cab_transactions.branchcode = 'ISA'
AND cab_cab_tran_adjustments.branchcode = 'ISA'
AND(cab_cfg_trx_type_mapping.cab_reportgroup = 'CABValuation' OR cab_cfg_trx_type_mapping.cab_reportgroup IS NULL)
GROUP BY reporttitle,
reportsubtitle,
cab_cab_transactions.branchcode,
cab_cab_transactions.prtfo_cd,
cab_cab_transactions.sstm_scrty_id,
cab_cab_transactions.sstm_trx_id,
cab_cab_transactions.trde_dttm,
cab_cab_transactions.efcte_dttm,
cab_cab_transactions.due_stlmnt_dt,
cab_cab_transactions.cncl_efcte_dttm,
cab_cab_transactions.trde_sstm_id,
cab_cab_transactions.trx_type_cd,
cab_cab_transactions.trx_type_dscrn,
cab_cab_transactions.trx_subtype_cd,
cab_cab_transactions.trde_stat_flg,
cab_cab_transactions.csh_cr_dr_indcr,
cab_cab_transactions.long_shrt_indcr,
cab_cab_transactions.lcl_crncy,
cab_cab_transactions.stlmt_crncy,
cab_cab_transactions.nomin_qty,
cab_cab_transactions.price,
cab_cab_transactions.lcl_cst,
cab_cab_transactions.prtfo_cst,
cab_cab_transactions.lcl_book_cst,
cab_cab_transactions.prtfo_book_cst,
cab_cab_transactions.lcl_sell_prcds,
cab_cab_transactions.prtfo_sell_prcds,
cab_cab_transactions.lcl_gnls,
cab_cab_transactions.prtfo_gnls,
cab_cab_transactions.lcl_acrd_intrt,
cab_cab_transactions.prtfo_acrd_intrt,
cab_cab_transactions.stlmt_crncy_stlmt_amt,
cab_cab_transactions.lcl_net_amt,
cab_cab_transactions.prtfo_net_amt,
cab_cab_transactions.fx_bght_amt,
cab_cab_transactions.fx_sold_amt,
cab_cab_transactions.prtfo_crncy_stlmt_amt,
cab_cab_transactions.prtfo_net_incme,
cab_cab_transactions.dvnd_crncy_net_incme,
cab_cab_transactions.dvnd_type_cd,
cab_cab_transactions.lcl_intrt_pd_rec,
cab_cab_transactions.prtfo_intrt_pd_rec,
cab_cab_transactions.lcl_dvdnd_pd_rec,
cab_cab_transactions.prtfo_dvdnd_pd_rec,
cab_cab_transactions.lcl_sundry_inc_pd_rec,
cab_cab_transactions.prtfo_sundry_inc_pd_rec,
cab_cab_transactions.bnk_csh_cptl_secid,
cab_cab_transactions.bnk_csh_inc_secid,
cab_cab_transactions.reportdate,
cab_cab_transactions.filename
最後執行時間只有15.1秒,快了60多倍,不得不說這優化效果還是很可觀的。不過查了很多資料,仍然沒有發現合理地解釋:為什麼distinct
和group by
的效率會有這麼大差別。查的很多資料,講的基本都是兩者相差不大,實現也差不多。有待解決。
關於distinct
和group by
的去重邏輯淺析
在資料庫操作中,我們常常遇到需要將資料去重計數的工作。例如:
表A,列col
A
C
A
B
C
D
A
B
結果就是一共出現4個不同的字母A、B、C、D
即結果為4
大體上我們可以選擇count(distinct col)
的方法和group+count
的方法。
分別為:
select count(distinct col) from A;
select count(1) from (select 1 from A group by col) alias;
兩中方法實現有什麼不同呢?
其實上述兩中方法分別是在運算和儲存上的權衡。
distinct
需要將col
列中的全部內容都儲存在一個記憶體中,可以理解為一個hash
結構,key
為col
的值,最後計算hash
結構中有多少個key
即可得到結果。
很明顯,需要將所有不同的值都存起來。記憶體消耗可能較大。
而group by
的方式是先將col
排序。而資料庫中的group
一般使用sort
的方法,即資料庫會先對col
進行排序。而排序的基本理論是,時間複雜為nlogn
,空間為1
.,然後只要單純的計數就可以了。優點是空間複雜度小,缺點是要進行一次排序,執行時間會較長。
兩中方法各有優劣,在使用的時候,我們需要根據實際情況進行取捨。
具體情況可參考如下法則
資料分佈 | 去重方式 | 原因 |
---|---|---|
離散 | group |
distinct 空間佔用較大,在時間複雜度允許的情況下,group 可以發揮空間複雜度優勢 |
集中 | distinct |
distinct 空間佔用較小,可以發揮時間複雜度優勢 |
兩個極端:
1.資料列的所有資料都一樣,即去重計數的結果為1時,用distinct
最佳
2.如果資料列唯一,沒有相同數值,用group
最好
當然,在group by
時,某些資料庫產品會根據資料列的情況智慧地選擇是使用排序去重還是hash
去重,例如postgresql
。當然,我們可以根據實際情況對執行計劃進行人工的干預,而這不是這裡要討論的話題了。
使用EXISTS
替換DISTINCT
當查詢中包含的表之間有一對多的關係時,避免在SELECT
子句中使用DISTICT
,可以使用EXISTS
替換。
--查詢emp表中目前所有員工都在哪些部門工作(包括部門編號和部門名稱)
--使用DISTINCT(低效)
SELECT DISTINCT d.deptno, d.dname FROM dept d, emp e WHERE d.deptno = e.deptno;
--使用EXISTS(高效)
select d.deptno, d.dname from dept d
where exists (select 1 from emp e where e.deptno = d.deptno);
使用exists
+使用exists
代替in
+使用exists
代替distinct
使用exists
代替in
exists
只檢查行的存在性,in
檢查實際的值,所以exists
的效能比in
好
驗證
select * from emp
where deptno in(select deptno from dept where loc='NEW YORK');
select * from emp e
where exists(select 1 from dept d where d.deptno=e.deptno and loc='NEW YORK');
使用exists
代替distinct
exists
只檢查行的存在性,distinct
用於禁止重複行的顯示,而且distinct
在禁止重複行的顯示前需要排序檢索的行,所以exists
的效能比distinct
好
驗證
select distinct e.deptno,d.dname from emp e,dept d
where e.deptno=d.deptno;
select d.deptno,d.dname from dept d
where exists(select 1 from emp e where e.deptno=d.deptno);
相關文章
- Group by 優化優化
- exists與in子查詢優化優化
- Oracle優化案例-用left join代替反連線 not in not exists(十)Oracle優化
- 【PostgreSQL 】PostgreSQL 15對distinct的優化SQL優化
- oracle優化Oracle優化
- 【MySQL】NOT EXISTS優化的一個案例MySql優化
- Oracle效能優化-SQL優化(案例一)Oracle優化SQL
- Oracle效能優化-SQL優化(案例二)Oracle優化SQL
- Oracle效能優化-SQL優化(案例三)Oracle優化SQL
- Oracle效能優化-SQL優化(案例四)Oracle優化SQL
- 對含distinct操作的SQL的優化SQL優化
- Oracle SQL優化之sql tuning advisorOracleSQL優化
- oracle優化技巧Oracle優化
- oracle EM 優化Oracle優化
- oracle 效能優化Oracle優化
- Oracle優化案例-使用with as優化Subquery Unnesting(七)Oracle優化
- sql優化之邏輯優化SQL優化
- Oracle 效能優化-expdp備份速度優化02Oracle優化
- Oracle 效能優化-expdp備份速度優化03Oracle優化
- [20190624]12c group by優化 .txt優化
- Oracle優化的方法Oracle優化
- Oracle in 查詢優化Oracle優化
- Android效能優化之佈局優化Android優化
- oracle中distinct和group by的區別Oracle
- group by排序,derived_merge優化的坑排序優化
- Android 效能優化之記憶體優化Android優化記憶體
- Android效能優化篇之服務優化Android優化
- MySQL優化之系統變數優化MySql優化變數
- MySQL調優之索引優化MySql索引優化
- Oracle優化案例-(三十四)Oracle優化
- Android記憶體優化之圖片優化Android記憶體優化
- Android應用優化之冷啟動優化Android優化
- 資料庫優化之臨時表優化資料庫優化
- Laravel的unique和exists驗證規則的優化Laravel優化
- Oracle優化案例-單表分頁語句的優化(八)Oracle優化
- 【Hive】資料傾斜優化 shuffle, join, group byHive優化
- Webpack之模組化優化Web優化
- MySQL調優之查詢優化MySql優化