【R語言學習筆記】關於提取各類模型值的意外發現
之前在做各類迴歸方程和檢驗的時候,針對模型裡面的值的提取總是有一種碰運氣的成本,比如在做t檢驗的時候想提取裡面的自由度,隨便舉個例子,基於mtcars這個資料集
a<-t.test(mtcars$vs,mtcars$cyl)
結果為
Welch Two Sample t-test
data: mtcars$vs and mtcars$cyl
t = -17.528, df = 35.907, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.415358 -5.084642
sample estimates:
mean of x mean of y
0.4375 6.1875
裡面其實是有df=35.907這個欄位的,但是不能每次看到後在手工提取,之前的做法是針對這類名稱,直接用a$df去看,但是其實這個欄位儲存在parameter裡,比如
a$parameter
df
35.90693
那麼問題來了,我怎麼知道哪個引數儲存在哪裡呢?
下面意外的用到了str函式。
比如針對剛才的t檢驗結果a,用str
str(a)
List of 9
$ statistic : Named num -17.5
..- attr(*, "names")= chr "t"
$ parameter : Named num 35.9
..- attr(*, "names")= chr "df"
$ p.value : num 3.5e-19
$ conf.int : atomic [1:2] -6.42 -5.08
..- attr(*, "conf.level")= num 0.95
$ estimate : Named num [1:2] 0.438 6.188
..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
$ null.value : Named num 0
..- attr(*, "names")= chr "difference in means"
$ alternative: chr "two.sided"
$ method : chr "Welch Two Sample t-test"
$ data.name : chr "mtcars$vs and mtcars$cyl"
- attr(*, "class")= chr "htest"
看到有各類的引數,儲存在$後的欄位裡,比如我要提取p值,直接輸入
a$p.value
[1] 3.500725e-19
就能看到p值為3.500725e-19。
同理,我做一個方差分析,比如就這個mtcars資料集了
fit.a<-aov(mpg~am,data=mtcars)
summary(fit.a)
Df Sum Sq Mean Sq F value Pr(>F)
am 1 405.2 405.2 16.86 0.000285 ***
Residuals 30 720.9 24.0
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
看這個方差分析都有什麼引數:
str(fit.a)
List of 12
$ coefficients : Named num [1:2] 17.15 7.24
..- attr(*, "names")= chr [1:2] "(Intercept)" "am"
$ residuals : Named num [1:32] -3.39 -3.39 -1.59 4.25 1.55 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ effects : Named num [1:32] -113.65 -20.13 -0.64 4.33 1.63 ...
..- attr(*, "names")= chr [1:32] "(Intercept)" "am" "" "" ...
$ rank : int 2
$ fitted.values: Named num [1:32] 24.4 24.4 24.4 17.1 17.1 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ assign : int [1:2] 0 1
$ qr :List of 5
..$ qr : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
.. .. ..$ : chr [1:2] "(Intercept)" "am"
.. ..- attr(*, "assign")= int [1:2] 0 1
..$ qraux: num [1:2] 1.18 1.18
..$ pivot: int [1:2] 1 2
..$ tol : num 1e-07
..$ rank : int 2
..- attr(*, "class")= chr "qr"
$ df.residual : int 30
$ xlevels : Named list()
$ call : language aov(formula = mpg ~ am, data = mtcars)
$ terms :Classes 'terms', 'formula' language mpg ~ am
.. ..- attr(*, "variables")= language list(mpg, am)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "mpg" "am"
.. .. .. ..$ : chr "am"
.. ..- attr(*, "term.labels")= chr "am"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(mpg, am)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "mpg" "am"
$ model :'data.frame': 32 obs. of 2 variables:
..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..- attr(*, "terms")=Classes 'terms', 'formula' language mpg ~ am
.. .. ..- attr(*, "variables")= language list(mpg, am)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "mpg" "am"
.. .. .. .. ..$ : chr "am"
.. .. ..- attr(*, "term.labels")= chr "am"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(mpg, am)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "am"
- attr(*, "class")= chr [1:2] "aov" "lm"
可以看到有12個引數,比如我想看下相關係數:
fit.a$coefficients
(Intercept) am
17.147368 7.244939
而且前面的截距就是數字,還可以計算
fit.a$coefficients[1]*5
85.73684
同理,弄一個logistic迴歸的廣義線性模型
fit.b<-glm(am~mpg+gear,data=mtcars,family=quasibinomial())
summary(fit.b)
結果為:
Call:
glm(formula = am ~ mpg + gear, family = quasibinomial(), data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.68311 -0.00003 -0.00002 0.04042 1.17990
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -88.2992 7928.1434 -0.011 0.9912
mpg 0.3366 0.1403 2.399 0.0231 *
gear 20.3062 1982.0355 0.010 0.9919
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 0.3263161)
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 11.659 on 29 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 19
看到mpg有點顯著性,那麼我想要提取這個相關係數,看到廣義模型的引數更為複雜
str(fit.b)
List of 30
$ coefficients : Named num [1:3] -88.299 0.337 20.306
..- attr(*, "names")= chr [1:3] "(Intercept)" "mpg" "gear"
$ residuals : Named num [1:32] 2.01 2.01 1.55 -1 -1 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ fitted.values : Named num [1:32] 4.99e-01 4.99e-01 6.46e-01 1.73e-09 6.96e-10 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ effects : Named num [1:32] -0.44841 1.37014 -0.00585 0.13687 0.08687 ...
..- attr(*, "names")= chr [1:32] "(Intercept)" "mpg" "gear" "" ...
$ R : num [1:3, 1:3] -1.41 0 0 -30.88 4.07 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
.. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
$ rank : int 3
$ qr :List of 5
..$ qr : num [1:32, 1:3] -1.41 3.56e-01 3.40e-01 4.87e-05 3.09e-05 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
.. .. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
..$ rank : int 3
..$ qraux: num [1:3] 1.36 1.09 1
..$ pivot: int [1:3] 1 2 3
..$ tol : num 1e-11
..- attr(*, "class")= chr "qr"
$ family :List of 11
..$ family : chr "quasibinomial"
..$ link : chr "logit"
..$ linkfun :function (mu)
..$ linkinv :function (eta)
..$ variance :function (mu)
..$ dev.resids:function (y, mu, wt)
..$ aic :function (y, n, mu, wt, dev)
..$ mu.eta :function (eta)
..$ initialize: expression({ if (NCOL(y) == 1) { if (is.factor(y)) y <- y != levels(y)[1L] n <- rep.int(1, nobs) if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1") mustart <- (weights * y + 0.5)/(weights + 1) } else if (NCOL(y) == 2) { n <- y[, 1] + y[, 2] y <- ifelse(n == 0, 0, y[, 1]/n) weights <- weights * n mustart <- (n * y + 0.5)/(n + 1) } else stop("for the 'quasibinomial' family, y must be a vector of 0 and 1's\nor a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
..$ validmu :function (mu)
..$ valideta :function (eta)
..- attr(*, "class")= chr "family"
$ linear.predictors: Named num [1:32] -0.00586 -0.00586 0.60003 -20.1774 -21.08622 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ deviance : num 11.7
$ aic : num NA
$ null.deviance : num 43.2
$ iter : int 19
$ weights : Named num [1:32] 2.50e-01 2.50e-01 2.29e-01 4.69e-09 1.89e-09 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ prior.weights : Named num [1:32] 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ df.residual : int 29
$ df.null : int 31
$ y : Named num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ converged : logi TRUE
$ boundary : logi FALSE
$ model :'data.frame': 32 obs. of 3 variables:
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..- attr(*, "terms")=Classes 'terms', 'formula' language am ~ mpg + gear
.. .. ..- attr(*, "variables")= language list(am, mpg, gear)
.. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:3] "am" "mpg" "gear"
.. .. .. .. ..$ : chr [1:2] "mpg" "gear"
.. .. ..- attr(*, "term.labels")= chr [1:2] "mpg" "gear"
.. .. ..- attr(*, "order")= int [1:2] 1 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(am, mpg, gear)
.. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:3] "am" "mpg" "gear"
$ call : language glm(formula = am ~ mpg + gear, family = quasibinomial(), data = mtcars)
$ formula :Class 'formula' language am ~ mpg + gear
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
$ terms :Classes 'terms', 'formula' language am ~ mpg + gear
.. ..- attr(*, "variables")= language list(am, mpg, gear)
.. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:3] "am" "mpg" "gear"
.. .. .. ..$ : chr [1:2] "mpg" "gear"
.. ..- attr(*, "term.labels")= chr [1:2] "mpg" "gear"
.. ..- attr(*, "order")= int [1:2] 1 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(am, mpg, gear)
.. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:3] "am" "mpg" "gear"
$ data :'data.frame': 32 obs. of 11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:32] 160 160 108 258 360 ...
..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ offset : NULL
$ control :List of 3
..$ epsilon: num 1e-08
..$ maxit : num 25
..$ trace : logi FALSE
$ method : chr "glm.fit"
$ contrasts : NULL
$ xlevels : Named list()
- attr(*, "class")= chr [1:2] "glm" "lm"
我只想提取顯著的相關係數,則
fit.b$coefficients
(Intercept) mpg gear
-88.2992383 0.3366025 20.3061829
總結下,當我們做個檢驗、分析、迴歸、包括主成分分析、聚類等時候,str函式和summary函式可以配合的很好,自動化的進行下一步工作。
相關文章
- R語言學習筆記之一R語言筆記
- 【R語言學習筆記】若干排序問題R語言筆記排序
- R語言學習筆記:基礎知識R語言筆記
- R語言data manipulation學習筆記之subset dataR語言筆記
- 【R語言學習筆記】探索ggplot的排列組合(一)R語言筆記
- 【R語言學習筆記】文件讀取和型別修改R語言筆記型別
- C 語言學習筆記筆記
- C語言學習筆記C語言筆記
- Solidity語言學習筆記————14、左值運算子Solid筆記
- Solidity語言學習筆記————9、左值運算子Solid筆記
- R語言的初級學習R語言
- 組合語言學習筆記組合語言筆記
- 【R語言學習筆記】探索ggplot的排列組合:線圖(一)R語言筆記
- 模型微調-書生浦語大模型實戰營學習筆記&大語言模型5大模型筆記
- 學習筆記17:DenseNet實現多分類(卷積基特徵提取)筆記SENet卷積特徵
- 應用統計學與R語言實現學習筆記(七)——擬合優度檢驗R語言筆記
- Solidity語言學習筆記————23、函式呼叫和賦值Solid筆記函式賦值
- 語言模型(五)—— Seq2Seq、Attention、Transformer學習筆記模型ORM筆記
- 熱更新語言--lua學習筆記筆記
- c語言學習筆記===函式C語言筆記函式
- 《JavaScript語言精粹》學習筆記二JavaScript筆記
- 《JavaScript語言精粹》學習筆記一JavaScript筆記
- Go 基礎語言學習筆記Go筆記
- 哈佛經濟學家關於工作效率的意外發現
- 關於http(自己的學習筆記)HTTP筆記
- 語音學習筆記1------matlab實現自相關函式法基音週期提取筆記Matlab函式
- R/3 ABAP開發學習筆記(轉)筆記
- 一名Java開發的Go語言學習筆記(一)JavaGo筆記
- 【學習筆記】關於Freemark程式碼輸出值的問題筆記
- swift 關於 toolbar 學習筆記Swift筆記
- C語言學習筆記——位運算C語言筆記
- c語言程式基礎學習筆記C語言筆記
- go 學習筆記之初識 go 語言Go筆記
- C語言學習筆記--C運算子C語言筆記
- Go語言學習筆記(七)之方法Go筆記
- C語言學習筆記之變數C語言筆記變數
- Solidity語言學習筆記————4、常量Solid筆記
- Solidity語言學習筆記————36、 庫Solid筆記