[PROC FREQ] 單組率置信區間的計算

Snoopy1866發表於2021-12-11

原文網址 : https://www.cnblogs.com/snoopy1866/p/15674999.html

本文連結：https://www.cnblogs.com/snoopy1866/p/15674999.html
利用PROC FREQ過程中的binomial語句可以很方便地計算單組率置信區間，SAS提供了9種（不包括校正法）計算單組率置信區間的方法，現列舉如下：
首先準備示例資料：

data test;
    input out $ weight;
cards;
陽性 95
陰性 5
;
run;

1. Wald 法

基於Wald法構建的單組率的置信區間應用非常廣泛，且Wald在結構上有著以點估計為中心對稱分佈的天然優勢，基於Wald法構建的單樣本率置信區間可表示為：

\[p\pm z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}} \]

優點：以點估計為中心，對稱分佈
缺點：（1）Overshoot: 置信區間可能超過[0,1]範圍（2）Degeneracy: 區間寬度可能為0（p=0或1時）（3）覆蓋率差
程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = wald);
    weight weight;
run;

2. Wald 法（連續性校正）

\[p\pm \left( z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}} + \frac{1}{2n} \right) \]

優點：（1）可避免區間寬度可能為0的情況（2）覆蓋率較Wald法有所改善
缺點：（1）結果偏保守（2）更容易發生置信區間超過[0,1]範圍的情況
程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = wald(correct));
    weight weight;
run;

3. Agresti-Coull

Agresti-Coull法的主要思路是選擇一個大於0的常數作為pseudo-frequency，在計算樣本量的時候對點估計進行校正，目的是使點估計儘量向中央（0.5）靠攏，這個大於零0的常數被稱為估計因子\(\phi\)。
Agresti和Coull提出了\(\phi\)的兩種形式，\(\phi =\frac{1}{2}z_{\alpha/2}^2\)和\(\phi = 2\)，前者稱為ADDZ2校正法，後者稱為ADD4校正法，SAS中僅提供ADDZ2校正法，當\(\alpha = 0.05\)時，\(z_{\alpha/2}\)接近2，此時ADDZ2校正法與ADD4校正法近似。
當\(\phi = 2\)時（ADD4校正法），其實際含義是樣本成功率和失敗例數分別加2，即總樣本量加4。

\[\tilde{p} \pm \sqrt{\frac{\tilde{p}(1-\tilde{p})}{n + z_{\alpha/2}^2}} \]

其中\(\tilde{p} = \frac{n_1 + \frac{1}{2}z_{\alpha/2}^2}{n + z_{\alpha/2}^2}\)
優點：（1）downward spikes現象略有改善（downward spikes：當率在極端情況下，置信區間覆蓋率急劇下滑）
缺點：（1）犧牲了置信區間寬度
程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = agresticoull);
    weight weight;
run;

4. Wilson Score法

Wilson Score法作為Wald法的替代，應用十分廣泛，是目前學界公認的在非極端率情況下的最佳置信區間構建方法。
基於Wilson Score法構建的置信區間可表示為：

\[\left\vert p-\hat{p} \right\vert = z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}} \]

\(p\)可表示為：

\[\frac{1}{1 + \frac{1}{n}z_{\alpha/2}^2} \left( \hat{p} + \frac{z_{\alpha/2}^2}{2n} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p}) + \frac{1}{4n}z_{\alpha/2}^2}{n}} \right) \]

優點：（1）被認為是moderate proportion（率不接近0或1）的最佳方法
缺點：（1）存在downward spikes現象
程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = wilson);
    weight weight;
run;

5. Wilson Score法（連續性校正）

基於Wilson Score法連續性校正構建的置信區間可表示為：

\[\left\vert p-\hat{p} \right\vert - \frac{1}{2n} = z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}} \]

程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = wilson(correct));
    weight weight;
run;

6. Jeffreys法

Jeffreys法構建的置信區間表示如下：

\[L={\rm Beta}\left( \frac{\alpha}{2}, n_1 + \frac{1}{2}, n - n_1 + \frac{1}{2} \right) \]

\[U={\rm Beta}\left( 1 - \frac{\alpha}{2}, n_1 + \frac{1}{2}, n - n_1 + \frac{1}{2} \right) \]

程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = jeffreys);
    weight weight;
run;

7. 似然比法

似然比法通過逆推似然比檢驗構造置信區間，零假設下似然比檢驗統計量可表示為：

\[L(p_0) = -2\left( n_1\ln{\frac{\hat{p}}{p_0}} + (n - n_1)\ln{\frac{1 - \hat{p}}{1 - p_0}} \right) \]

使檢驗統計量\(L(p_0)\)落在接受域內的所有\(p_0\)組成的區間即為似然比法的置信區間：\(\{p_0: L(p_0) < \chi_{1,\alpha}^2\}\)，PROC FREQ通過迭代計算尋找置信限。
程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = likelihoodratio);
    weight weight;
run;

8. Logit法

基於Logit變換 \(Y = \ln(\frac{\hat{p}}{1 - \hat{p}})\)，\(Y\) 的近似置信區間用以下公式計算：

\[Y_L = \ln{\frac{\hat{p}}{1 - \hat{p}}} - z_{\alpha /2} \sqrt{\frac{n}{n_1(n - n_1)}} \]

\[Y_U = \ln{\frac{\hat{p}}{1 - \hat{p}}} + z_{\alpha /2} \sqrt{\frac{n}{n_1(n - n_1)}} \]

\(p\) 的置信區間可表示為：

\[P_L = \exp{\left( \frac{Y_L}{1 + \exp{(Y_L)}} \right)} \]

\[P_U = \exp{\left( \frac{Y_U}{1 + \exp{(Y_U)}} \right)} \]

程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = likelihoodratio);
    weight weight;
run;

9. Clopper-Pearson法

基於二項分佈構建的置信區間方法，使得精確置信限滿足以下方程：

\[\sum_{x = n_1}^n \dbinom{n}{x} P_L^x(1 - P_L)^{n - x} = \frac{\alpha}{2} \]

\[\sum_{x = 0}^{n_1} \dbinom{n}{x} P_U^x(1 - P_U)^{n - x} = \frac{\alpha}{2} \]

PROC FREQ 使用 \(F\) 分佈計算Clopper-Pearson置信限，公式如下：

\[P_L = \left[ 1 + \frac{n - n_1 + 1}{n_1 F\left( \frac{\alpha}{2}, 2n_1, 2(n - n_1 + 1) \right)} \right]^{-1} \]

\[P_U = \left[ 1 + \frac{n - n_1}{(n_1 + 1) F\left( 1 - \frac{\alpha}{2}, 2(n_1 + 1), 2(n - n_1) \right)} \right]^{-1} \]

程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = likelihoodratio);
    weight weight;
run;

10. Mid-P法

Mid-P 精確置信限滿足以下方程：

\[\sum_{x = n_1 + 1}^n \dbinom{n}{x} P_L^x(1 - P_L)^{n-x} + \frac{1}{2}\dbinom{n}{n_1} P_L^{n_1}(1-P_L)^{n-n_1} = \frac{\alpha}{2} \]

\[\sum_{x = 0}^{n_1 - 1} \dbinom{n}{x} P_U^x(1 - P_U)^{n-x} + \frac{1}{2}\dbinom{n}{n_1} P_U^{n_1}(1-P_U)^{n-n_1} = \frac{\alpha}{2} \]

程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = midp);
    weight weight;
run;

11. Blaker法

通過對雙側 Blaker 精確檢驗逆推來構建置信區間，使檢驗統計量\(B(p_0, n_1)\)落在接受域內的所有\(p_0\)組成的區間稱為Blaker置信區間：\(\{ p_0: B(p_0, n_1) > \alpha \}\)
其中：

\[B(p_0, n_1) = {\rm Prob}\left( \gamma(p_0, X) \le \gamma(p_0, n_1)|p_0 \right) \]

\[\gamma(p_0, n_1) = \min{\left( {\rm Prob}(X \ge n_1|p_0), {\rm Prob}(X \le n_1|p_0) \right)} \]

程式碼：

proc freq data = test;
    tables out /nopercent nocol norow nocum binomial(level = "陽性" cl = blaker);
    weight weight;
run;

最後，將以上9種方法同時展示（wald 和 wilson 僅展示校正法）：

proc freq data = test;
    tables out /nopercent nocol norow 
                nocum binomial(level = "陽性"
                               cl = (wald(correct)  agresticoull    wilson(correct)
                                     jeffreys       likelihoodratio logit
                                     clopperpearson midp            blaker));
    weight weight;
run;

參考文獻：徐瑩. 一種新的單樣本率的置信區間估計方法[D].南方醫科大學,2019.

matlab 繪製置信範圍_fill(繪製其區間形成的區域)
2020-12-03
Matlab
[20211202]FREQ=WEEKLY FREQ=DAILY是等價的.txt
2021-12-02
AI
單元測試的覆蓋率計算
2024-07-10
使用pytz模組進行時區轉換及時間計算
2018-07-02
【cbo計算公式】單表選擇率(二)
2018-08-19
公式
[20180625]簡單計算日誌生成率.txt
2018-06-25
SciTech-Mathmatics-Probability+Statistics-Population:Region-Sampling of Region : Confidence Interval(置信區間)
2024-09-16
IDE
（計算機組成原理）RISC與CISC的區別
2023-03-06
計算機
區間統計聚合函式組合器
2024-05-27
函式
計算CAN波特率
2024-10-12
統計將重疊區間合併成組的方案數.18098728
2024-03-27
計算返回單、雙位元組字元混合字串長度
2021-02-05
字元字串
計算機組成原理 - 計算篇
2020-11-26
計算機
新股中籤率怎麼算？新股中籤率計算公式
2022-03-04
公式
簡單混合運算的計算器
2020-11-29
【計算機組成原理】第6章計算機的運算方法
2021-11-09
計算機
時間差計算
2024-10-28
PHP時間計算
2021-09-09
PHP
Python計算商品復購率
2020-12-27
Python
絕密公式算單雙準確率99最強計劃方法
2021-07-16
公式
時間複雜度的計算
2018-11-06
時間複雜度
在計算機組成原理中x,計算機組成原理
2023-03-08
計算機
JS-計算日期差值；計算日期之間的月數
2018-08-01
JS
邊緣計算模組和工控機有什麼區別？
2024-11-15
0014---簡單的計算
2020-04-05
【重學計算機】計算機組成原理
2020-12-31
計算機
JavaScript計算兩個時間點之間的時間差
2018-07-04
JavaScript
單像空間後方交會計算旋轉矩陣
2020-11-18
矩陣
XYD 統計區間
2024-07-30
python 計算txt文字詞頻率
2018-07-29
Python
SQL面試必考——計算留存率
2020-10-02
SQL面試
計算機組成原理
2023-03-08
計算機
邊緣計算、霧計算、雲端計算區別幾何？
2020-11-08
如何計算振盪電路的固有頻率
2024-11-02
C++中struct的空間計算
2019-04-03
C++Struct
JS呼叫時間的方法和計算
2019-09-22
JS
本地計算、雲端計算、霧計算、邊緣計算有什麼區別？
2022-11-13
雲端計算學習路線圖素材、課件， join之間的區別
2020-01-19

[PROC FREQ] 單組率置信區間的計算

1. Wald 法

2. Wald 法（連續性校正）

3. Agresti-Coull

4. Wilson Score法

5. Wilson Score法（連續性校正）

6. Jeffreys法

7. 似然比法

8. Logit法

9. Clopper-Pearson法

10. Mid-P法

11. Blaker法

相關文章