【R語言學習筆記】關於提取各類模型值的意外發現

Ryan_Yang_發表於2017-04-15

之前在做各類迴歸方程和檢驗的時候,針對模型裡面的值的提取總是有一種碰運氣的成本,比如在做t檢驗的時候想提取裡面的自由度,隨便舉個例子,基於mtcars這個資料集

a<-t.test(mtcars$vs,mtcars$cyl)

結果為

Welch Two Sample t-test

data:  mtcars$vs and mtcars$cyl
t = -17.528, df = 35.907, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.415358 -5.084642
sample estimates:
mean of x mean of y 
   0.4375    6.1875 

裡面其實是有df=35.907這個欄位的,但是不能每次看到後在手工提取,之前的做法是針對這類名稱,直接用a$df去看,但是其實這個欄位儲存在parameter裡,比如

a$parameter
 df 
35.90693 

那麼問題來了,我怎麼知道哪個引數儲存在哪裡呢?

下面意外的用到了str函式。

比如針對剛才的t檢驗結果a,用str

str(a)
List of 9
 $ statistic  : Named num -17.5
  ..- attr(*, "names")= chr "t"
 $ parameter  : Named num 35.9
  ..- attr(*, "names")= chr "df"
 $ p.value    : num 3.5e-19
 $ conf.int   : atomic [1:2] -6.42 -5.08
  ..- attr(*, "conf.level")= num 0.95
 $ estimate   : Named num [1:2] 0.438 6.188
  ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
 $ null.value : Named num 0
  ..- attr(*, "names")= chr "difference in means"
 $ alternative: chr "two.sided"
 $ method     : chr "Welch Two Sample t-test"
 $ data.name  : chr "mtcars$vs and mtcars$cyl"
 - attr(*, "class")= chr "htest"

看到有各類的引數,儲存在$後的欄位裡,比如我要提取p值,直接輸入

a$p.value
[1] 3.500725e-19

就能看到p值為3.500725e-19。

同理,我做一個方差分析,比如就這個mtcars資料集了

fit.a<-aov(mpg~am,data=mtcars)
summary(fit.a)
Df Sum Sq Mean Sq F value   Pr(>F)    
am           1  405.2   405.2   16.86 0.000285 ***
Residuals   30  720.9    24.0                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

看這個方差分析都有什麼引數:

str(fit.a)
List of 12
 $ coefficients : Named num [1:2] 17.15 7.24
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "am"
 $ residuals    : Named num [1:32] -3.39 -3.39 -1.59 4.25 1.55 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ effects      : Named num [1:32] -113.65 -20.13 -0.64 4.33 1.63 ...
  ..- attr(*, "names")= chr [1:32] "(Intercept)" "am" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:32] 24.4 24.4 24.4 17.1 17.1 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ assign       : int [1:2] 0 1
 $ qr           :List of 5
  ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "am"
  .. ..- attr(*, "assign")= int [1:2] 0 1
  ..$ qraux: num [1:2] 1.18 1.18
  ..$ pivot: int [1:2] 1 2
  ..$ tol  : num 1e-07
  ..$ rank : int 2
  ..- attr(*, "class")= chr "qr"
 $ df.residual  : int 30
 $ xlevels      : Named list()
 $ call         : language aov(formula = mpg ~ am, data = mtcars)
 $ terms        :Classes 'terms', 'formula'  language mpg ~ am
  .. ..- attr(*, "variables")= language list(mpg, am)
  .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2] "mpg" "am"
  .. .. .. ..$ : chr "am"
  .. ..- attr(*, "term.labels")= chr "am"
  .. ..- attr(*, "order")= int 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(mpg, am)
  .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:2] "mpg" "am"
 $ model        :'data.frame': 32 obs. of  2 variables:
  ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ am
  .. .. ..- attr(*, "variables")= language list(mpg, am)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "mpg" "am"
  .. .. .. .. ..$ : chr "am"
  .. .. ..- attr(*, "term.labels")= chr "am"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(mpg, am)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "am"
 - attr(*, "class")= chr [1:2] "aov" "lm"

可以看到有12個引數,比如我想看下相關係數:

fit.a$coefficients
(Intercept)          am 
  17.147368    7.244939 

而且前面的截距就是數字,還可以計算

fit.a$coefficients[1]*5
85.73684 

同理,弄一個logistic迴歸的廣義線性模型

fit.b<-glm(am~mpg+gear,data=mtcars,family=quasibinomial())
summary(fit.b)

結果為:

Call:
glm(formula = am ~ mpg + gear, family = quasibinomial(), data = mtcars)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.68311  -0.00003  -0.00002   0.04042   1.17990  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -88.2992  7928.1434  -0.011   0.9912  
mpg            0.3366     0.1403   2.399   0.0231 *
gear          20.3062  1982.0355   0.010   0.9919  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 0.3263161)

    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 11.659  on 29  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 19

看到mpg有點顯著性,那麼我想要提取這個相關係數,看到廣義模型的引數更為複雜

str(fit.b)
List of 30
 $ coefficients     : Named num [1:3] -88.299 0.337 20.306
  ..- attr(*, "names")= chr [1:3] "(Intercept)" "mpg" "gear"
 $ residuals        : Named num [1:32] 2.01 2.01 1.55 -1 -1 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ fitted.values    : Named num [1:32] 4.99e-01 4.99e-01 6.46e-01 1.73e-09 6.96e-10 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ effects          : Named num [1:32] -0.44841 1.37014 -0.00585 0.13687 0.08687 ...
  ..- attr(*, "names")= chr [1:32] "(Intercept)" "mpg" "gear" "" ...
 $ R                : num [1:3, 1:3] -1.41 0 0 -30.88 4.07 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
  .. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
 $ rank             : int 3
 $ qr               :List of 5
  ..$ qr   : num [1:32, 1:3] -1.41 3.56e-01 3.40e-01 4.87e-05 3.09e-05 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
  .. .. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
  ..$ rank : int 3
  ..$ qraux: num [1:3] 1.36 1.09 1
  ..$ pivot: int [1:3] 1 2 3
  ..$ tol  : num 1e-11
  ..- attr(*, "class")= chr "qr"
 $ family           :List of 11
  ..$ family    : chr "quasibinomial"
  ..$ link      : chr "logit"
  ..$ linkfun   :function (mu)  
  ..$ linkinv   :function (eta)  
  ..$ variance  :function (mu)  
  ..$ dev.resids:function (y, mu, wt)  
  ..$ aic       :function (y, n, mu, wt, dev)  
  ..$ mu.eta    :function (eta)  
  ..$ initialize:  expression({     if (NCOL(y) == 1) {         if (is.factor(y))              y <- y != levels(y)[1L]         n <- rep.int(1, nobs)         if (any(y < 0 | y > 1))              stop("y values must be 0 <= y <= 1")         mustart <- (weights * y + 0.5)/(weights + 1)     }     else if (NCOL(y) == 2) {         n <- y[, 1] + y[, 2]         y <- ifelse(n == 0, 0, y[, 1]/n)         weights <- weights * n         mustart <- (n * y + 0.5)/(n + 1)     }     else stop("for the 'quasibinomial' family, y must be a vector of 0 and 1's\nor a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
  ..$ validmu   :function (mu)  
  ..$ valideta  :function (eta)  
  ..- attr(*, "class")= chr "family"
 $ linear.predictors: Named num [1:32] -0.00586 -0.00586 0.60003 -20.1774 -21.08622 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ deviance         : num 11.7
 $ aic              : num NA
 $ null.deviance    : num 43.2
 $ iter             : int 19
 $ weights          : Named num [1:32] 2.50e-01 2.50e-01 2.29e-01 4.69e-09 1.89e-09 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ prior.weights    : Named num [1:32] 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ df.residual      : int 29
 $ df.null          : int 31
 $ y                : Named num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ converged        : logi TRUE
 $ boundary         : logi FALSE
 $ model            :'data.frame': 32 obs. of  3 variables:
  ..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
  ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula'  language am ~ mpg + gear
  .. .. ..- attr(*, "variables")= language list(am, mpg, gear)
  .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:3] "am" "mpg" "gear"
  .. .. .. .. ..$ : chr [1:2] "mpg" "gear"
  .. .. ..- attr(*, "term.labels")= chr [1:2] "mpg" "gear"
  .. .. ..- attr(*, "order")= int [1:2] 1 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(am, mpg, gear)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:3] "am" "mpg" "gear"
 $ call             : language glm(formula = am ~ mpg + gear, family = quasibinomial(), data = mtcars)
 $ formula          :Class 'formula'  language am ~ mpg + gear
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
 $ terms            :Classes 'terms', 'formula'  language am ~ mpg + gear
  .. ..- attr(*, "variables")= language list(am, mpg, gear)
  .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:3] "am" "mpg" "gear"
  .. .. .. ..$ : chr [1:2] "mpg" "gear"
  .. ..- attr(*, "term.labels")= chr [1:2] "mpg" "gear"
  .. ..- attr(*, "order")= int [1:2] 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(am, mpg, gear)
  .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:3] "am" "mpg" "gear"
 $ data             :'data.frame': 32 obs. of  11 variables:
  ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
  ..$ disp: num [1:32] 160 160 108 258 360 ...
  ..$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
  ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
  ..$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
  ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
  ..$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
  ..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
  ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
  ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
 $ offset           : NULL
 $ control          :List of 3
  ..$ epsilon: num 1e-08
  ..$ maxit  : num 25
  ..$ trace  : logi FALSE
 $ method           : chr "glm.fit"
 $ contrasts        : NULL
 $ xlevels          : Named list()
 - attr(*, "class")= chr [1:2] "glm" "lm"

我只想提取顯著的相關係數,則

fit.b$coefficients
(Intercept)         mpg        gear 
-88.2992383   0.3366025  20.3061829 

總結下,當我們做個檢驗、分析、迴歸、包括主成分分析、聚類等時候,str函式和summary函式可以配合的很好,自動化的進行下一步工作。

相關文章