GEO資料庫基礎知識

weixin_33922672發表於2018-12-21

GEO資料庫基礎知識

  • GEO Platform (GPL) 晶片平臺
  • GEO Sample (GSM) 樣本ID號
  • GEO Series (GSE) study的ID號
  • GEO Dataset (GDS) 資料集的ID號 ## 用法

三篇老大參考連結

1.https://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247486063&idx=1&sn=156bee5397e979722b36b78284188538&chksm=9b484ad4ac3fc3c2d025b9e4bb1c3c8392839c08d84697754d7d95d041b539479a45f19cf5d5&scene=21#wechat_redirect

2.http://www.bio-info-trainee.com/bioconductor_China/software/GEOquery.html

3.http://www.bio-info-trainee.com/1085.html

GEO包下載:

source("http://www.bioconductor.org/biocLite.R")
biocLite("GEOquery")
library(GEOquery)
options(warn=-1)
suppressMessages(library(GEOquery))
gds858 <- getGEO('GDS858', destdir=".")
names(Meta(gds858))
Table(gds858)[1:5,1:5]
11316862-b3e41f9227bd3220.jpg
image
library(GEOquery)
if(!file.exists(f)){
  gset <- getGEO('GSE76275', destdir=".",
                 AnnotGPL = F,     ## 註釋檔案
                 getGPL = F)       ## 平臺檔案
  save(gset,file=f)                ## 儲存到本地
}
load('GSE76275_eSet.Rdata')        ## 載入資料
class(gset)
length(gset)
class(gset[[1]])
a=gset[[1]] ## 降級提取a
dat=exprs(a)  ## 獲取表達矩陣
dim(dat)
dat[1:4,1:4]
pd=pData(a) 使用函式?pData獲取樣本臨床資訊(如性別、年齡、腫瘤分期等等)
trait=pd[,51:53]
head(trait)
trait$T=substring(trait[,2],2,2)
trait$N=substring(trait[,2],4,4)
trait$M=substring(trait[,2],6,6)
colnames(trait)=c('age','tmn','bmi','T','M','N')
head(trait)
save(trait,file='trait.Rdata')

group_list = ifelse(pd$characteristics_ch1.1=='triple-negative status: not TN',
   'noTNBC','TNBC')
table(group_list)
save(dat,group_list,file = 'step1-output.Rdata')

dat

11316862-aec88bfaa0e12bcf.jpg
image-20181221092415595

dat[1:4;1:4]

11316862-3de9458de66985d9.jpg
image

trait=[ ,51:53]

11316862-14505f4c578e4d29.jpg
image

head(trait)

11316862-77176b58ed3bf03a.jpg
image
trait=pd[,51:53]
head(trait)
trait$T=substring(trait[,2],2,2)
trait$N=substring(trait[,2],4,4)
trait$M=substring(trait[,2],6,6)
colnames(trait)=c('age','tmn','bmi','T','M','N')
head(trait)
save(trait,file='trait.Rdata')
11316862-c24e999997f61b1f.jpg
image
group_list = ifelse(pd$characteristics_ch1.1=='triple-negative status: not TN','noTNBC','TNBC')
table(group_list)
11316862-59d242f840ab4d2e.jpg
image
save(dat,group_list,file = 'step1-output.Rdata')

以上第一步結束了,生成“step-output.Rdata檔案”

相關文章