分析函式的應用(更新版)

itpub120發表於2007-04-01

前言:
1. 對原文重新排版,增強可讀性
2. 文中列舉的例子都在scott/tiger使用者下測試透過,增強可操作性,易驗證。
3. 繼續增加:分析函式在美同統計系統中的應用,分析函式效能比較
4. 歡迎大家批評指正.

[@more@]

概述:
分析函式(oracle8.1.6以後版本支援),為解決"Calculate a running total", "Find percentages within a group", "Top-N queries", "Compute a moving average"等問題而設計的。
標準的PL/SQL可以解決大部分問題,然而效能不盡人意。分析函式就是標準SQL的語言外掛,不僅編碼簡單,而且效能比純SQL和PL/SQL好。分析函式已經獲得ANSI SQL機構鑑定,增添到標準的SQL規範裡。

分析函式如何工作的呢?:
分析函式用來計算一組資料的集合的值。不同於每組返回多行的集合函式。一組多行資料的集合稱為"視窗",可以用分析子句實現界定。
針對每行資料而言,定義了一個多行的"移動"的視窗.在計算"當前行"資料時候,視窗決定了計算多行資料的一個範圍,視窗的尺寸由物理的行數或邏輯分割點(如:時間)決定。
除了order by 語句外,分析函式是在查詢語句中後最後執行的語句,
所有的表連線(joins),group by,having 語句都是在分析函式之前執行。故分析函式只出現在select和order by語句中

語法:

Analytic-Function(,,...)
OVER (


)

分析函式:(黑體函式本文有例子解釋)

AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST,LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE.CUBE,ROLLUP

引數:
分析函式需要0到3的引數

partition by語句:
根據partition by條件,邏輯上把一組結果集分為N組。關鍵字"partition" 和 "group"常常同時出現。分析函式獨立應用於每組。

Order by 語句:
在每組(分割槽)排序,作用於分析函式的結果集.
Order by 後可以接以下語句:
Asc|Desc 升序/降序
Nulls First|Nulls Last 表示空值在排序中,置於開始還是最後


Windowsing-clause語句:
Between … and 指定視窗範圍的起點與終點
Unbonded preceding 表示視窗的起點就是分割槽的第1行
Unbonded following 表示視窗的終點就是分割槽的最後1行
Current row 當前行
Rows 表示行數範圍
Range 表示欄位的範圍


舉例說明:

例子1: Calculate a running Total(累加)

SELECT ename "Ename", deptno "Deptno", sal "Sal",
SUM(sal) OVER (ORDER BY deptno, ename) "Running Total",
--根據deptno,name排序後的結果集累加
SUM(SAL) OVER (PARTITION BY deptno ORDER BY ename) "Dept Total",
--先deptno分割槽,再ename排序後結果集,然後累加
ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ENAME) "Seq"
FROM emp --行數統計
ORDER BY deptno, ename

Ename Deptno Sal Running Tota Dept Total Seq
CLARK 10 2450.00 2450 2450 1
KING 10 5000.00 7450 7450 2
MILLER 10 1300.00 8750 8750 3
ADAMS 20 1100.00 9850 1100 1
FORD 20 3000.00 12850 4100 2
JONES 20 2975.00 15825 7075 3
SCOTT 20 3000.00 18825 10075 4
SMITH 20 800.00 19625 10875 5
ALLEN 30 1600.00 21225 1600 1
BLAKE 30 2850.00 24075 4450 2
JAMES 30 950.00 25025 5400 3
MARTIN 30 1250.00 26275 6650 4
TURNER 30 1500.00 27775 8150 5
WARD 30 1250.00 29025 9400 6

例子說明
1.SUM(sal) OVER (ORDER BY deptno, ename) 表示針對整個查詢逐個累加員工薪水
2.SUM(SAL) OVER (PARTITION BY deptno ORDER BY ename)表示針對公司每個部門逐個累加各部門內部員工的薪水
3.ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ENAME)表示順序顯示每個部門員工的行號


例子2:Top-N Queries(前N位查詢--沒有並列的名次)

SELECT * FROM (
SELECT deptno, ename, sal, ROW_NUMBER()
OVER (
PARTITION BY deptno ORDER BY sal DESC
) Top3 FROM emp
)
WHERE Top3 <= 3

DEPTNO ENAME SAL TOP3
1 10 KING 5000.00 1
2 10 CLARK 2450.00 2
3 10 MILLER 1300.00 3
4 20 SCOTT 3000.00 1
5 20 FORD 3000.00 2 -> 只是順序的往下排,沒有並列的名次
6 20 JONES 2975.00 3
7 30 BLAKE 2850.00 1
8 30 ALLEN 1600.00 2
9 30 TURNER 1500.00 3

例子說明:
查詢各個部門薪水排在前3名的員工的資料(沒有並列的名次)

例子3:Top-N Queries(前N位查詢--產生並列的名次)

3-1: 兩個相同的資料,那麼後面的資料就會不跳過這個排名,
SELECT * FROM (
SELECT deptno, ename, sal,
DENSE_RANK()
OVER (
PARTITION BY deptno ORDER BY sal desc
) TopN FROM emp
)
WHERE TopN <= 3
ORDER BY deptno, sal DESC

DEPTNO ENAME SAL TOPN
1 10 KING 5000.00 1
2 10 CLARK 2450.00 2
3 10 MILLER 1300.00 3
4 20 SCOTT 3000.00 1--〉
5 20 FORD 3000.00 1--〉如果兩個員工的薪水一樣,
6 20 JONES 2975.00 2 產生並列的名次
7 20 ADAMS 1100.00 3
8 30 BLAKE 2850.00 1
9 30 ALLEN 1600.00 2
10 30 TURNER 1500.00 3

例子說明:
查詢各個部門薪水排在前名的員工的資料(並列的名次)

3-2: 兩個相同的資料,那麼後面的資料就會直接跳過這個排名,
SELECT * FROM (
SELECT deptno, ename, sal, rank()
OVER (
PARTITION BY deptno ORDER BY sal DESC
) Top3 FROM emp
)
WHERE Top3 <= 3
DEPTNO ENAME SAL TOP3
10 KING 5000.00 1
10 CLARK 2450.00 2
10 MILLER 1300.00 3
20 SCOTT 3000.00 1
20 FORD 3000.00 1
20 JONES 2975.00 3->跳過2的排名
30 BLAKE 2850.00 1
30 ALLEN 1600.00 2
30 TURNER 1500.00 3


例子4:視窗---移動累加

SELECT deptno "Deptno", ename "Ename", sal "Sal",
SUM(SAL)
OVER (PARTITION BY deptno
ORDER BY ename
ROWS 2 PRECEDING) "Sliding Total"
FROM emp
ORDER BY deptno, ename

Deptno Ename Sal Sliding Total
1 10 CLARK 2450.00 2450
2 10 KING 5000.00 7450
3 10 MILLER 1300.00 8750
4 20 ADAMS 1100.00 1100
5 20 FORD 3000.00 4100
6 20 JONES 2975.00 7075
7 20 SCOTT 3000.00 8975 --〉(8975 = 3000+2975+3000)
8 20 SMITH 800.00 6775
9 30 ALLEN 1600.00 1600
10 30 BLAKE 2850.00 4450
11 30 JAMES 950.00 5400
12 30 MARTIN 1250.00 5050
13 30 TURNER 1500.00 3700
14 30 WARD 1250.00 4000

例子說明
sum(sal) over (partition by deptno order by ename rows 2 preceding)
查詢的時候,累加各個部門的當前員工前2位員工的工資。

例子5:視窗--範圍(根據時間)內累加
SELECT ename, hiredate, hiredate-100 hiredate_pre,
COUNT(*)
OVER (
ORDER BY hiredate ASC
RANGE 100 PRECEDING
) cnt
FROM emp
ORDER BY hiredate ASC

ENAME HIREDATE HIREDATE_PRE CNT
1 SMITH 1980-12-17 1980-9-8
2 ALLEN 1981-2-20 1980-11-12 2
3 WARD 1981-2-22 1980-11-14 3
4 JONES 1981-4-2 1980-12-23 3
5 BLAKE 1981-5-1 1981-1-21 4
6 CLARK 1981-6-9 1981-3-1 3 -> 1981-3-17 TURNER 1981-9-8 1981-5-31 2 有3個員工的僱傭時間在這個區間
8 MARTIN 1981-9-28 1981-6-20 2
9 KING 1981-11-17 1981-8-9 3
10 JAMES 1981-12-3 1981-8-25 5
11 FORD 1981-12-3 1981-8-25 5
12 MILLER 1982-1-23 1981-10-15 4
13 SCOTT 1987-4-19 1987-1-9 1
14 ADAMS 1987-5-23 1987-2-12 2

例子6:視窗--範圍(根據時間)內平均數

SELECT ename, hiredate, sal,
AVG(sal)
OVER (
ORDER BY hiredate ASC
RANGE 100 PRECEDING
) avg_sal
FROM emp
ORDER BY hiredate ASC


ENAME HIREDATE SAL AVG_SAL
1 SMITH 1980-12-17 800.00 800
2 ALLEN 1981-2-20 1600.00 1200
3 WARD 1981-2-22 1250.00 1216.
4 JONES 1981-4-2 2975.00 1941.
5 BLAKE 1981-5-1 2850.00 2168.
6 CLARK 1981-6-9 2450.00 2758. --&gt 100天以內即1981-3-17 TURNER 1981-9-8 1500.00 1975 員工工資的平均數 2758=(2975+2850+2450)/3
8 MARTIN 1981-9-28 1250.00 1375
9 KING 1981-11-17 5000.00 2583
10 JAMES 1981-12-3 950.00 2340
11 FORD 1981-12-3 3000.00 2340
12 MILLER 1982-1-23 1300.00 2562
13 SCOTT 1987-4-19 3000.00 3000
14 ADAMS 1987-5-23 1100.00 2050

例子7:視窗--範圍(根據行數)的平均數

SELECT ename, hiredate, sal,
round(AVG(sal)
OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING)) AvgAsc,
COUNT(*)
OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) CntAsc,
round(AVG(sal)
OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) )AvgDes,
COUNT(*)
OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) CntDes
FROM emp
ORDER BY hiredate

1 SMITH 1980-12-17 800.00 800 1 1988 6
2 ALLEN 1981-2-20 1600.00 1200 2 2104 6
3 WARD 1981-2-22 1250.00 1217 3 2046 6
4 JONES 1981-4-2 2975.00 1656 4 2671 6
5 BLAKE 1981-5-1 2850.00 1895 5 2675 6
6 CLARK 1981-6-9 2450.00 1988 6 2358 6
注:1988=(800+1600+1250+2975+2850+2450)/6
7 TURNER 1981-9-8 1500.00 2104 6 2167 6
8 MARTIN 1981-9-28 1250.00 2046 6 2417 6
9 KING 1981-11-17 5000.00 2671 6 2392 6
10 JAMES 1981-12-3 950.00 2333 6 1588 4
11 FORD 1981-12-3 3000.00 2358 6 1870 5
12 MILLER 1982-1-23 1300.00 2167 6 1800 3
13 SCOTT 1987-4-19 3000.00 2417 6 2050 2
14 ADAMS 1987-5-23 1100.00 2392 6 1100 1

例子說明:
1.round(AVG(sal) OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING)
當前行再向前(上)推5行的平均數(共6行),5行以內以所在的實際行數平均.
2.round(AVG(sal)OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) )
當前行再向後(下)推5行的平均數(共6行),5行以內以所在的實際行數平均.

例子8:Lag and lead分析函式
SELECT deptno, ename, hiredate,
LAG(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) last_hire,
hiredate - LAG(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) days_last,
LEAD(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) next_hire,
LEAD(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) - hiredate days_next
FROM emp
ORDER BY deptno, hiredate


DEPTNO ENAME HIREDATE LAST_HIRE DAYS_LAST NEXT_HIRE DAYS_NEXT
10 CLARK 1981-6-9 1981-11-17 161
10 KING 1981-11-17 1981-6-9 161 1982-1-23 67
10 MILLER 1982-1-23 1981-11-17 67
20 SMITH 1980-12-17 1981-4-2 106
20 JONES 1981-4-2 1980-12-17 106 1981-12-3 245
20 FORD 1981-12-3 1981-4-2 245 1987-4-19 1963
20 SCOTT 1987-4-19 1981-12-3 1963 1987-5-23 34
20 ADAMS 1987-5-23 1987-4-19 34
30 ALLEN 1981-2-20 1981-2-22 2
30 WARD 1981-2-22 1981-2-20 2 1981-5-1 68
30 BLAKE 1981-5-1 1981-2-22 68 1981-9-8 130
30 TURNER 1981-9-8 1981-5-1 130 1981-9-28 20
30 MARTIN 1981-9-28 1981-9-8 20 1981-12-3 66
30 JAMES 1981-12-3 1981-9-28 66

例子說明:
取出上一個員工和下一個員工的僱傭時間放在當前員工的同一行(有點列行互換的意思)


例子9:first_value

SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (PARTITION BY deptno
ORDER BY sal ASC) AS MIN_SAL_HAS
FROM emp
ORDER BY deptno, ename

DEPTNO ENAME SAL MIN_SAL_HAS
10 CLARK 2450.00 MILLER
10 KING 5000.00 MILLER
10 MILLER 1300.00 MILLER->MILLER是部門10薪水最低的員工
20 ADAMS 1100.00 SMITH
20 FORD 3000.00 SMITH
20 JONES 2975.00 SMITH
20 SCOTT 3000.00 SMITH
20 SMITH 800.00 SMITH
30 ALLEN 1600.00 JAMES
30 BLAKE 2850.00 JAMES
30 JAMES 950.00 JAMES
30 MARTIN 1250.00 JAMES
30 TURNER 1500.00 JAMES
30 WARD 1250.00 JAMES

例子10:rollup,cube,grouping 彙總函式
10-1:
SELECT deptno,SUM(sal) FROM emp
GROUP BY ROLLUP(deptno)
DEPTNO SUM(SAL)
10 8750
20 10875
30 9400
29025 ----各個部門合計的結果

10-2:統計各部門員工的薪水情況,並且彙總各部門員工的
薪水
SELECT deptno,ename,SUM(sal) FROM emp
GROUP BY ROLLUP(deptno,ename)
10 KING 5000
10 CLARK 2450
10 MILLER 1300
10 8750 ---累加當前部門(10)員工的薪水
20 FORD 3000
20 ADAMS 1100
20 JONES 2975
20 SCOTT 3000
20 SMITH 800
20 10875
30 WARD 1250
30 ALLEN 1600
30 BLAKE 2850
30 JAMES 950
30 MARTIN 1250
30 TURNER 1500
30 9400
10-3:統計各部門每個職位的薪水情況
SELECT deptno,job,SUM(sal) FROM emp
GROUP BY CUBE(deptno,job)
ORDER BY deptno,job NULLS LAST


DEPTNO JOB SUM_TOTAL
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
10 8750-〉彙總部門所有職位的薪水
20 ANALYST 6000
20 CLERK 1900
20 MANAGER 2975
20 10875
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
30 9400
ANALYST 6000-〉彙總該職位在公司所有部門薪水
CLERK 4150
MANAGER 8275
PRESIDENT 5000
SALESMAN 5600
29025->彙總所有部門所用職位的薪水

10-4:與10-3等價.用到了gouping函式
grouping函式介紹
如果當前的彙總記錄是利用該欄位得出的,grouping函式就會返回1,否則返回0
SELECT decode(GROUPING(deptno),1,'All Dept',to_char(deptno)) deptno,
decode(GROUPING(job),1,'ALL Job',job) job,
SUM(sal) sum_t
FROM emp
GROUP BY CUBE(deptno,job)
ORDER BY deptno,job NULLS LAST

DEPTNO JOB SUM_T
10 ALL Job 8750
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
20 ALL Job 10875
20 ANALYST 6000
20 CLERK 1900
20 MANAGER 2975
30 ALL Job 9400
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
All Dept ALL Job 29025
All Dept ANALYST 6000
All Dept CLERK 4150
All Dept MANAGER 8275
All Dept PRESIDENT 5000
All Dept SALESMAN 5600

例子11. ratio_to_report比率函式

SELECT deptno,ename,SUM(sal),
trunc((ratio_to_report(SUM(sal))over(PARTITION BY deptno )),3) ratio
FROM emp
GROUP BY deptno,ename

DEPTNO ENAME SUM(SAL) RATIO
10 CLARK 2450 0.28  2450/(2450+5000+1300)=0.28
10 KING 5000 0.571
10 MILLER 1300 0.148
20 ADAMS 1100 0.101
20 FORD 3000 0.275
20 JONES 2975 0.273
20 SCOTT 3000 0.275
20 SMITH 800 0.073
30 ALLEN 1600 0.17
30 BLAKE 2850 0.303
30 JAMES 950 0.101
30 MARTIN 1250 0.132
30 TURNER 1500 0.159
30 WARD 1250 0.132

例子12:FIRST函式
功能說明:DENSE_RANK返回的集合中取出排在最前面的一個值的行
SELECT deptno,ename,empno,sal, MIN(sal) KEEP
(dense_rank FIRST ORDER BY empno) over (PARTITION BY deptno) "min_sal"
FROM emp

DEPTNO ENAME EMPNO SAL min_sal
10 CLARK 7782 2450.00 2450
10 KING 7839 5000.00 2450
10 MILLER 7934 1300.00 2450
20 SMITH 7369 800.00 800
20 ADAMS 7876 1100.00 800
20 FORD 7902 3000.00 800
20 SCOTT 7788 3000.00 800
20 JONES 7566 2975.00 800
30 ALLEN 7499 1600.00 1600
30 BLAKE 7698 2850.00 1600
30 MARTIN 7654 1250.00 1600
30 JAMES 7900 950.00 1600
30 TURNER 7844 1500.00 1600
30 WARD 7521 1250.00 1600

例子說明:
DENSE_RANK 按部門分割槽,再按工號排序,FIRST取出工號最小的對應的所有行
然後前面的MIN函式從這個集合中取出薪水最低的值

分析函式在美同統計系統的應用(陸續增加中)

應用1:每個小時mt失敗率
SELECT /*+RULE*/
TO_CHAR(MT_TIME, 'yyyymmddhh24') DD,
CLIENT_ID,
CLIENT_NAME,
SERVICE_ID,
MT_REPORT,
COUNT(*) MT_REPORT_CNT,
ratio_to_report(COUNT(*)) over
(PARTITION BY TO_CHAR(MT_TIME, 'yyyymmddhh24')) ratio
FROM SMSUSER.MT_LOG
WHERE MT_TIME >= TRUNC(SYSDATE)
AND MT_TIME < TRUNC(SYSDATE) + 1
GROUP BY TO_CHAR(MT_TIME, 'yyyymmddhh24'),
CLIENT_ID,
CLIENT_NAME,
SERVICE_ID,
MT_REPORT
應用2:


附:
關鍵字:
analytic functions 分析函式
unbounded 極大的
preceding 在前面
ratio 比率
dense 密集的,濃厚的
rows between 1 preceding and 1 following 當前行的前1行和後一行
rows between unbounded preceding and unbounded following
每行對應的資料視窗是從第一行到最後一行
range between 50 preceding and 100 following 等價於 [n-50,n+100]
count(*) over () as alias 加總
KEEP (dense_rank FIRST ORDER BY empno)..keep( dense_rank first..)
常搭配使用


參考文獻:
oracle官方文件Oracle9i SQL Reference a90125.pdf

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/8109090/viewspace-907865/,如需轉載,請註明出處,否則將追究法律責任。

相關文章