ANNOVAR region-based annotation-上篇

sas???發表於2018-06-15

原文網址 : https://blog.csdn.net/weixin_33906657/article/details/87467563

歡迎關注"生信修煉手冊"！

通過gene-based annotation 可以得到變異位點與基因之間的關係，除了與基因的關係之外，變異位點在基因組上某些特徵區域的分佈（比如轉錄因子結合區域，啟動子區，增強子區等）更引人關注，這一功能通過region-based annotation 來實現。

在進行區域相關注釋時，需要各種資料庫，不同的特徵區域對應的資料庫不同。annovar支援下列多種資料庫

1. 物種間保守區域

對人，小鼠，大鼠等5個脊椎動物的基因組序列進行多序列比對，然後採用phastCons軟體識別在不同物種間保守的基因組區域。在識別保守區域時，軟體會對每個保守區域進行打分。

第一步：下載phastConsElements46way資料庫，命令如下

annotate_variation.pl -build hg19 -downdb phastConsElements46way humandb/

NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/phastConsElements46way.txt.gz ... Done
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

資料庫檔案內容如下，第二列到第四列代表保守區域在基因組上的位置，第五列代表保守區域的名字，第六列代表該保守守區域的打分score值。

585     chr1    12002   12085   lod=33  343
585     chr1    12170   12232   lod=123 483
585     chr1    12594   12702   lod=219 545
585     chr1    12994   13054   lod=101 462

第二步，執行註釋，命令如下

annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype phastConsElements46way ex1.avinput humandb/

NOTICE: Output file is written to ex1.hg19_phastConsElements46way
NOTICE: Reading annotation database humandb/hg19_phastConsElements46way.txt ... Done with 5163775 regions
NOTICE: Finished region-based annotation on 21 genetic variants

輸出檔案的字尾為hg19_phastConsElements46way, 在輸入檔案的前面新增了兩列，內容如下

phastConsElements46way    Score=300;Name=lod=22
phastConsElements46way    Score=387;Name=lod=50
phastConsElements46way    Score=420;Name=lod=68
phastConsElements46way    Score=385;Name=lod=49
phastConsElements46way    Score=395;Name=lod=54
phastConsElements46way    Score=545;Name=lod=218

第一列為對應的資料庫的名字，第二列為基因組上保守區域的得分和名字。

2. TFBS

TFBS是Transcription factor binding site的縮寫，代表轉錄因子結合位點。在UCSC網站上，提供了轉錄因子結合位點的資料庫。

第一步：下載tfbsConsSites資料庫，命令如下

annotate_variation.pl -build hg19 -downdb tfbsConsSites humandb/

NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/tfbsConsSites.txt.gz ... Done
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

資料庫檔案內容如下，第二列到第四列代表轉錄因子在基因組上的結合位置，第五列代表轉錄因子的名字

591     chr1    894640  894654  V$P300_01       842     -       1.68
591     chr1    894641  894657  V$ELK1_01       898     -       2.7
591     chr1    894644  894654  V$CETS1P54_01   971     -       2.22

第二步，進行註釋，命令如下

annotate_variation.pl -regionanno -build hg19 -dbtype tfbsConsSites
ex1.avinput humandb/

NOTICE: Output file is written to ex1.avinput.hg19_tfbsConsSites
NOTICE: Reading annotation database humandb/hg19_tfbsConsSites.txt ... Done with 5797266 regions
NOTICE: Finished region-based annotation on 21 genetic variants

輸出檔案的字尾為hg19_tfbsConsSites, 在輸入檔案的前面新增了兩列，內容如下

tfbsConsSites   Score=767;Name=V$PAX5_02
tfbsConsSites   Score=880;Name=V$CEBPA_01
tfbsConsSites   Score=878;Name=V$FREAC3_01

第一列為對應的資料庫的名字，第二列為轉錄因子結合區域的得分和對應的轉錄因子的名字。

3. cytoband

UCSC提供了cytoband的資料庫。

第一步，下載cytoBand資料庫，命令如下

annotate_variation.pl -build hg19 -downdb cytoBand humandb/

NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz ... Done
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

資料庫檔案內容如下

chr1    0       2300000 p36.33  gneg
chr1    2300000 5400000 p36.32  gpos25
chr1    5400000 7200000 p36.31  gneg
chr1    7200000 9200000 p36.23  gpos25

第二步，進行註釋，命令如下

annotate_variation.pl -regionanno -build hg19 -dbtype cytoBand ex1.avinput
humandb/

NOTICE: Output file is written to ex1.avinput.hg19_cytoBand
NOTICE: Reading annotation database humandb/hg19_cytoBand.txt ... Done with 862 regions
NOTICE: Finished region-based annotation on 21 genetic variants

輸出檔案的字尾為hg19_cytoBand, 在輸入檔案的前面新增了兩列，內容如下

cytoBand    1p36.33
cytoBand    1p36.33
cytoBand    1p36.31
cytoBand    1q23.3
cytoBand    1p31.1

第一列為對應的資料庫的名字，第二列為對應的cytoband區域的名字。

4. microRNA和snoRNA

UCSC提供了microRNA和snoRNA在基因組上的位置，叫做wgRna,通過這個資料庫，可以檢視變異位點是否位於microRNA和snoRNA對應的基因組區域上。

第一步，下載資料庫，命令如下

annotate_variation.pl -build hg19 -downdb wgRna humandb/

NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgRna.txt.gz ... Done
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

資料庫中檔案內容如下：

585     chr1    30365   30503   hsa-mir-1302-2  0       +       0       0       miRNA
593     chr1    1102483 1102578 hsa-mir-200b    0       +       0       0       miRNA
799     chr1    28160911        28161077        ACA35   0       +       0       0       scaRna
804     chr1    28833876        28834083        U17a    0       +       0       0       HAcaBox
804     chr1    28835069        28835274        U17b    0       +       0       0       HAcaBox

第二步，進行註釋，命令如下

annotate_variation.pl -regionanno -build hg19 -dbtype wgRna ex1.avinput humandb/

NOTICE: Output file is written to ex1.avinput.hg19_wgRna
NOTICE: Reading annotation database humandb/hg19_wgRna.txt ... Done with 1341 regions
NOTICE: Finished region-based annotation on 21 genetic variants

輸出檔案的字尾為hg19_wgRna, 在輸入檔案的前面新增了兩列，內容如下

wgRna   Name=hsa-mir-1302-2
wgRna   Name=hsa-mir-1290
wgRna   Name=HBII-420

第一列為對應的資料庫的名字，第二列為micoRNA/snoRNA的名字。

5. microRNA binding sites

UCSC給出了TargetScanHuman網站預測的microRNA結合位點。

第一步，下載targetScanS資料庫，命令如下

annotate_variation.pl -build hg19 -downdb targetScanS humandb/

NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/targetScanS.txt.gz ... Done
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

資料庫中檔案內容如下：

591     chr1    879822  879830  SAMD11:miR-504  90      +
591     chr1    900599  900606  KLHL17:miR-299/299-3p   26      +
591     chr1    900605  900612  KLHL17:miR-124/506      7       +
591     chr1    900933  900941  KLHL17:miR-19   82      +
591     chr1    901054  901061  KLHL17:miR-137  14      +

第二步，進行註釋，命令如下

annotate_variation.pl -regionanno -build hg19 -dbtype targetScanS ex1.avinput humandb/

NOTICE: Output file is written to ex1.avinput.hg19_targetScanS
NOTICE: Reading annotation database humandb/hg19_targetScanS.txt ... Done with 54199 regions
NOTICE: Finished region-based annotation on 21 genetic variants

輸出檔案的字尾為hg19_targetScanS, 在輸入檔案的前面新增了兩列，內容如下

targetScanS     Score=90;Name=SAMD11:miR-504
targetScanS     Score=82;Name=KLHL17:miR-19

第一列為對應的資料庫的名字，第二列為結合區域的打分和對應的基因和microRNA的名字。

6. segmental duplications

基因組上的重複序列區域，這部分序列在比對時由於同源性，會存在比對情況不正確的情況。

第一步，下載genomicSuperDups 資料庫，命令如下

annotate_variation.pl -build hg19 -downdb genomicSuperDups humandb/

NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/genomicSuperDups.txt.gz ... Done
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

資料庫檔案列數較多，擷取了前5列，內容如下：

585     chr1    10000   87112   chr15:102446355
585     chr1    10000   20818   chr12:84886
585     chr1    10000   19844   chrY:59352887
585     chr1    10000   19844   chrX:155249881
585     chr1    10464   40733   chr2:114330297

第二步，進行註釋，命令如下

annotate_variation.pl -regionanno -build hg19 -dbtype genomicSuperDups ex1.avinput humandb/

NOTICE: Output file is written to ex1.avinput.hg19_genomicSuperDups
NOTICE: Reading annotation database humandb/hg19_genomicSuperDups.txt ... Done with 51599 regions
NOTICE: Finished region-based annotation on 21 genetic variants

輸出檔案的字尾為hg19_genomicSuperDups, 在輸入檔案的前面新增了兩列，內容如下

genomicSuperDups    Score=0.905283;Name=chr1:1439902
genomicSuperDups    Score=0.99612;Name=chr1:13142561
genomicSuperDups    Score=0.991956;Name=chr15:102446355

第一列為對應的資料庫的名字，第二列為重複區域的名字和打分。

7. structural variants

DGV資料庫中儲存了基因組結構變異的資訊，annovar利用這個資料庫來分析變異位點是否在已發表的結構變異區間上。

第一步，下載dgvMerged資料庫，命令如下

annotate_variation.pl -build hg19 -downdb dgvMerged humandb/

NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/dgvMerged.txt.gz ... Done
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

資料庫檔案列數較多，擷取了前5列，內容如下：

9       chr1    0       2300000 nsv482937
585     chr1    10000   127330  nsv7879
585     chr1    10000   22118   dgv1n82
585     chr1    10190   10281   nsv958854
73      chr1    10376   1018704 esv2758911

第二步，進行註釋，命令如下

annotate_variation.pl -regionanno -build hg19 -dbtype dgvMerged ex1.avinput humandb/

NOTICE: Output file is written to ex1.avinput.hg19_dgvMerged
NOTICE: Reading annotation database humandb/hg19_dgvMerged.txt ... Done with 392583 regions
NOTICE: Finished region-based annotation on 21 genetic variants

輸出檔案的字尾為hg19_dgvMerged, 在輸入檔案的前面新增了兩列，內容如下

dgvMerged    Name=nsv832536,nsv545407
dgvMerged    Name=nsv830937,dgv235n100
dgvMerged    Name=nsv1243
dgvMerged    Name=nsv584699
dgvMerged    Name=esv3638608

第一列為對應的資料庫的名字，第二列為DGV資料庫中結構變異的ID。

8. GWAS

分析變異位點是否在之前的GWAS研究中報導過。

第一步，下載gwasCatalog資料庫,命令如下

annotate_variation.pl -build hg19 -downdb gwasCatalog humandb/

NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gwasCatalog.txt.gz ... Done
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

資料庫檔案列數較多，擷取了前5列，內容如下：

590     chr1    780396  780397  rs141175086
591     chr1    894572  894573  rs13303010
592     chr1    1005805 1005806 rs3934834
593     chr1    1079197 1079198 rs11260603
593     chr1    1173610 1173611 rs6697886

第二步，進行註釋，命令如下

annotate_variation.pl -regionanno -build hg19 -dbtype gwasCatalog ex1.avinput humandb/

NOTICE: Output file is written to ex1.avinput.hg19_gwasCatalog
NOTICE: Reading annotation database humandb/hg19_gwasCatalog.txt ... Done with 75593 regions
NOTICE: Finished region-based annotation on 21 genetic variants

輸出檔案的字尾為hg19_gwasCatalog, 在輸入檔案的前面新增了兩列，內容如下

gwasCatalog    Name=Crohn's disease
gwasCatalog    Name=Chronic inflammatory diseases

第一列為對應的資料庫的名字，第二列與該變異位點存在關聯的疾病或者形狀的名字。

在region-based annotation中，相關的資料庫非常多，本篇只介紹上述幾個資料庫，剩餘的資料庫在後續文章中在進行介紹。

掃描關注微訊號，更多精彩內容等著你！

Android Annotation-讓你的程式碼和設計更加優雅（一）
2018-10-15
Android
Kubernetes的DaemonSet（上篇）
2019-03-27
docker 入門上篇
2020-02-24
Docker
React文件精讀（上篇）
2019-03-04
React
Windows提權系列————上篇
2018-05-13
Windows
EventBus 原始碼分析（上篇）
2018-09-30
原始碼
Android Annotation-讓你的程式碼更加優雅（二）做一個Java詩人（JavaPoet）
2018-10-16
AndroidJava
nginx正規表示式(上篇)
2019-03-25
Nginx
javascript Array方法總結（上篇）
2019-03-09
JavaScript
【C++複習】棧-上篇
2024-11-16
C++
GitLab快速上手指南上篇
2024-11-20
Gitlab
前端外掛之Datatables使用--上篇
2019-08-09
前端
TodoList深入Flutter狀態管理(上篇)
2019-07-26
Flutter
Kubernetes的汙點和容忍（上篇）
2019-03-19
Lua語法基礎教程（上篇）
2024-10-28
Hyperion神器之SmartView產品（上篇）
2021-08-10
View
Python科普系列——類與方法（上篇）
2021-11-15
Python
MyBatis詳細原始碼解析（上篇）
2020-12-15
MyBatis原始碼
快速梳理常用的設計模式（上篇）
2019-02-26
設計模式
開始測試React Native App（上篇）
2018-09-25
React NativeAPP
.Net Core in Docker極簡入門（上篇）
2020-07-22
Docker
Django使用Channels實現WebSocket--上篇
2019-04-18
DjangoWeb
State設計模式上篇(理論篇)
2024-06-03
設計模式
深入剖析多重揹包問題（上篇）
2022-07-16
ABP應用開發（Step by Step）-上篇
2022-04-26
帶你全方位使用Anko庫-上篇
2021-09-09
女朋友看了也懂的Kafka（上篇）
2021-06-06
Kafka
Java 異常處理上篇： Throwable 詳解
2024-02-01
Java
超全面 MySQL 語句加鎖分析（上篇）
2020-02-16
MySql
Android 架構元件的最新進展 (上篇)
2019-10-14
Android架構元件
【朝夕技術專刊】RabbitMQ路由解析（上篇）
2020-06-02
MQ路由
BurpSuite外掛開發指南之 API 上篇
2020-08-19
UIAPI
淺談Docker的安全性支援（上篇）
2019-05-16
Docker
【Flutter 元件集錄】Flexible、Expanded 和 Spacer (上篇)
2021-08-19
Flutter元件Flex
SSD新正規化｜從SATA到NVMe（上篇）
2022-05-27
sku演算法詳解及Demo～接上篇
2020-05-30
演算法
用 NetworkX + Gephi + Nebula Graph 分析人物關係（上篇）
2020-08-19
群邑電商：2020雙十一全景洞察（上篇）
2020-11-05

ANNOVAR region-based annotation-上篇

1. 物種間保守區域

2. TFBS

3. cytoband

4. microRNA和snoRNA

5. microRNA binding sites

6. segmental duplications

7. structural variants

8. GWAS

相關文章