exonerate結果整理,獲取target序列
軟體exonerate輸出的結果如下,想要獲得比對上的target序列
Command line: [./exonerate INPUT/UN029382.fa INPUT/scaffold125532.fa --model est2genome --showtargetgff TRUE --showvulgar no --showalignment yes --alignmentwidth 200 --bestn 1 --verbose 2]
Hostname: [node009]
C4 Alignment:
------------
Query: UN029382
Target: scaffold125532 [revcomp]
Model: est2genome
Raw score: 6062
Query range: 0 -> 1336
Target range: 23867182 -> 23861353
1 : ATCTGTTGCCCTCGCCCTTCGCAATGGCCTCCTCCTCCTCTGTCTCCCGTCCGCGGAAGCGTCCCGCCGCCGTCGCCTTTTCTTCCTCGCCTCCGCCGCCGTCGCCTTTTCTTCCTCGCCTCCGCCGCCGCCTCAG >>>> Target Intron 1 >>>> GGGCTAAGG : 145
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||++ 1301 bp ++|||||||||
23867182 : ATCTGTTGCCCTCGCCCTTCGCAATGGCCTCCTCCTCCTCTGTCTCCCGTCCGCGGAAGCGTCCCGCCGCCGTCGCCTTTTCTTCCTCGCCTCCGCCGCCGTCGCCTTTTCTTCCTCGCCTCCGCCGCCGCCTCAGgt.........................agGGGCTAAGG : 23865737
146 : ACTCTGAAATTGACACCAAAGAAGAATTTTCCCCTGATCTGGCGGACCTGTGATGTTCTTCAGCTTTATCTAAAGTCTTTTGGCAGG >>>> Target Intron 2 >>>> ACAGCTCGTTTGACGAGTCCAGAGGGACGTCAGCGAGACTACTTTGAGGCAGAGTTCT : 290
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||++ 83 bp ++||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
23865736 : ACTCTGAAATTGACACCAAAGAAGAATTTTCCCCTGATCTGGCGGACCTGTGATGTTCTTCAGCTTTATCTAAAGTCTTTTGGCAGGgt.........................agACAGCTCGTTTGACGAGTCCAGAGGGACGTCAGCGAGACTACTTTGAGGCAGAGTTCT : 23865509
291 : TTTTTAAAGAAGAAGCTGAAGATGCATTGCAGAACTGCAAAATCCCAAACATGACCATTGAATGGGCTGAAGCAAACATATCAGACAATCCACTTACAG >>>> Target Intron 3 >>>> GACCAGCACAAATTTCGTATGACCCACCAAGGTGTGACTACGATGA : 435
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||++ 74 bp ++||||||||||||||||||||||||||||||||||||||||||||||
23865508 : TTTTTAAAGAAGAAGCTGAAGATGCATTGCAGAACTGCAAAATCCCAAACATGACCATTGAATGGGCTGAAGCAAACATATCAGACAATCCACTTACAGgt.........................agGACCAGCACAAATTTCGTATGACCCACCAAGGTGTGACTACGATGA : 23865290
436 : TTTTAACATTCTGGTAAACAGCTCGAGCACAACTTTTAAATAATTCGCTTATTGGTCTGAAGCAAACATATCAGACAATCCACTTACAGGTAATGATAAGTATAAGTAAATCTTGAGCCTGCTTATTGGTTTCACGAGAAATAATTCGCTTCTGTCAATACAGGACCAGCACAA : 609
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
23865289 : TTTTAACATTCTGGTAAACAGCTCGAGCACAACTTTTAAATAATTCGCTTATTGGTCTGAAGCAAACATATCAGACAATCCACTTACAGGTAATGATAAGTATAAGTAAATCTTGAGCCTGCTTATTGGTTTCACGAGAAATAATTCGCTTCTGTCAATACAGGACCAGCACAA : 23865116
610 : ATTTCGTATGACCCACCAAGGTGTGACTACGATGATTTTAACATTCTGGTAAACAGCTCGAGCACAACTTTTAAATAATTCGCTTATTGGTCTGAA--GCA- >>>> Target Intron 4 >>>> AACATATCAGACAATC--------CACTTACAGGACCAGCAC- : 743
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| | | | | || | | | ||| ++ 525 bp ++||||| | | ||| | |||| | ||||| |
23865115 : ATTTCGTATGACCCACCAAGGTGTGACTACGATGATTTTAACATTCTGGTAAACAGCTCGAGCACAACTTTTAAATTAAGCACATGTTCCTTCGTATTGCATgt.........................agAACATGCATGCCCATCTTTGTAAGAAGTTACCTG-CCAGCTCT : 23864447
744 : -AAATTTCGTATGACCCACCAAGGTGTGACTACGATGATTT-TAACATTCTGCCATTAGTACCACAGCCACGAAACAATCCTTTTCACATAAAATGGGTATTACCTAAAATGCCGAAAAGACAACAAGGCCAGCCAGAAGAACCTCAATTACCAGCCGCTCGCTATTCCCCTGA : 914
|||| | | | || || | | || | | |||| | | | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
23864446 : TGAATTGC-TTTTTCC--TTAAAATTTAAC-ATTTTAATTTGTTTTGTGCAGCCATTAGTACCACAGCCACGAAACAATCCTTTTCACATAAAATGGGTATTACCTAAAATGCCGAAAAGACAACAAGGCCAGCCAGAAGAACCTCAATTACCAGCCGCTCGCTATTCCCCTGA : 23864277
915 : AAAAGTTAAGGTTGAGCCAGCAGACCCAAGAAAACCGGCCAAGCCGCGGTACTGGCCTAAGTTTCCAATATATCTGCCAATAAAATGACGCCTCGGATGAGAAAGGCTACATCGGCTCGCAGTAAG >>>> Target Intron 5 >>>> CTCCAGGAGTAGAAGAATC : 1059
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||++ 2501 bp ++|||||||||||||||||||
23864276 : AAAAGTTAAGGTTGAGCCAGCAGACCCAAGAAAACCGGCCAAGCCGCGGTACTGGCCTAAGTTTCCAATATATCTGCCAATAAAATGACGCCTCGGATGAGAAAGGCTACATCGGCTCGCAGTAAGgt.........................agCTCCAGGAGTAGAAGAATC : 23861631
1060 : TTTTGTTGAGAAACAAGACATTCAAGGCTCTCTTTCTCTTGTCGAGAAATAAGACATTCAAGGCTCTCTTTTCTTAAAAGAAAGTGCATTTTTTGTGGAATTGTGGGATTCGTCCCTTCACTACTTTTTTTTGGTAGAGCTGCTGTCTCCTAGAGCTTACTGTGCAATAGACAT : 1233
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
23861630 : TTTTGTTGAGAAACAAGACATTCAAGGCTCTCTTTCTCTTGTCGAGAAATAAGACATTCAAGGCTCTCTTTTCTTAAAAGAAAGTGCATTTTTTGTGGAATTGTGGGATTCGTCCCTTCACTACTTTTTTTTGGTAGAGCTGCTGTCTCCTAGAGCTTACTGTGCAATAGACAT : 23861457
1234 : GCATGAAGTATTCGTAGTCTTTTTTATTCAAGTTTAGATTTCCAAGCATATATGCTATAGCCTAAAAAAAAACTGGTCGAAATGCAGGTTTGGTCTGTTGTTG : 1336
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
23861456 : GCATGAAGTATTCGTAGTCTTTTTTATTCAAGTTTAGATTTCCAAGCATATATGCTATAGCCTAAAAAAAAACTGGTCGAAATGCAGGTTTGGTCTGTTGTTG : 23861354
# --- START OF GFF DUMP ---
#
#
##gff-version 2
##source-version exonerate:est2genome 2.2.0
##date 2016-06-22
##type DNA
#
#
# seqname source feature start end score strand frame attributes
#
scaffold125532 exonerate:est2genome gene 23861354 23867182 6062 - . gene_id 0 ; sequence UN029382 ; gene_orientation +
scaffold125532 exonerate:est2genome utr5 23867047 23867182 . - .
scaffold125532 exonerate:est2genome exon 23867047 23867182 . - . insertions 0 ; deletions 0
scaffold125532 exonerate:est2genome splice5 23867045 23867046 . - . intron_id 1 ; splice_site "GT"
scaffold125532 exonerate:est2genome intron 23865746 23867046 . - . intron_id 1
scaffold125532 exonerate:est2genome splice3 23865746 23865747 . - . intron_id 0 ; splice_site "AG"
scaffold125532 exonerate:est2genome utr5 23865650 23865745 . - .
scaffold125532 exonerate:est2genome exon 23865650 23865745 . - . insertions 0 ; deletions 0
scaffold125532 exonerate:est2genome splice5 23865648 23865649 . - . intron_id 2 ; splice_site "GT"
scaffold125532 exonerate:est2genome intron 23865567 23865649 . - . intron_id 2
scaffold125532 exonerate:est2genome splice3 23865567 23865568 . - . intron_id 1 ; splice_site "AG"
scaffold125532 exonerate:est2genome utr5 23865410 23865566 . - .
scaffold125532 exonerate:est2genome exon 23865410 23865566 . - . insertions 0 ; deletions 0
scaffold125532 exonerate:est2genome splice5 23865408 23865409 . - . intron_id 3 ; splice_site "GT"
scaffold125532 exonerate:est2genome intron 23865336 23865409 . - . intron_id 3
scaffold125532 exonerate:est2genome splice3 23865336 23865337 . - . intron_id 2 ; splice_site "AG"
scaffold125532 exonerate:est2genome utr5 23865014 23865335 . - .
scaffold125532 exonerate:est2genome exon 23865014 23865335 . - . insertions 3 ; deletions 0
scaffold125532 exonerate:est2genome splice5 23865012 23865013 . - . intron_id 4 ; splice_site "GT"
scaffold125532 exonerate:est2genome intron 23864489 23865013 . - . intron_id 4
scaffold125532 exonerate:est2genome splice3 23864489 23864490 . - . intron_id 3 ; splice_site "AG"
scaffold125532 exonerate:est2genome utr5 23864151 23864488 . - .
scaffold125532 exonerate:est2genome exon 23864151 23864488 . - . insertions 11 ; deletions 5
scaffold125532 exonerate:est2genome splice5 23864149 23864150 . - . intron_id 5 ; splice_site "GT"
scaffold125532 exonerate:est2genome intron 23861650 23864150 . - . intron_id 5
scaffold125532 exonerate:est2genome splice3 23861650 23861651 . - . intron_id 4 ; splice_site "AG"
scaffold125532 exonerate:est2genome exon 23861354 23861649 . - . insertions 0 ; deletions 0
scaffold125532 exonerate:est2genome similarity 23861354 23867182 6062 - . alignment_id 0 ; Query UN029382 ; Align 23867183 1 136 ; Align 23865746 137 96 ; Align 23865567 233 157 ; Align 23865336 390 316 ; Align 23865018 706 3 ; Align 23864489 709 16 ; Align 23864465 725 10 ; Align 23864455 736 7 ; Align 23864446 743 7 ; Align 23864439 751 7 ; Align 23864432 760 12 ; Align 23864420 773 10 ; Align 23864409 783 258 ; Align 23861650 1041 296
# --- END OF GFF DUMP ---
#
-- completed exonerate analysis
程式碼如下
import re
with open('result.exonerate.txt', 'r') as f:
a =[]
for num, line in enumerate(f):
if '|' in line:
a.append(num + 1)
if 'Query:' in line:
print ">" + line.strip().split()[1],
elif 'Target:' in line:
print line.strip().split()[1]
elif num in a:
b = re.sub(r'[^A-Z]','', line[2:-2])
print b
修改加強版程式碼
import re
with open('result.exonerate.txt', 'r') as f:
a =[]
for num, line in enumerate(f):
if 'Query:' in line:
b = []
d = []
print ">" + line.strip().split()[1],
elif 'Target:' in line:
print line.strip().split()[1],
elif '|' in line:
a.append(num + 1)
b.append(line.count('|'))
elif 'Query range:' in line:
print int(line.strip().split()[-1]) - int(line.strip().split()[-3]),
elif num in a:
c = re.sub(r'[^A-Z]','', line[2:-2])
d.append(c)
elif 'completed exonerate analysis' in line:
count = sum(b)
print count
print ''.join(i for i in d)
相關文章
- PHP PDO獲取結果集PHP
- 獲取任務的執行結果
- 關於獲取事件相應的結果事件
- python執行shell並獲取結果Python
- 獲取多臺主機命令執行結果
- Laravel Excel 如何獲取 Excel 檔案的公式結果LaravelExcel公式
- 【Tips】獲取結果集中偶數行記錄
- 【Spark】 Spark作業執行原理--獲取執行結果Spark
- Shell指令碼中獲取SELECT結果值的方法指令碼
- ASP.NET獲取CPU序列號,硬碟序列號ID,獲取網路卡編號ASP.NET硬碟
- 如何獲取BIOS序列號iOS
- 【SQL】獲取指定範圍內結果集的實現方法SQL
- victoriaMetrics無法獲取抓取target的問題
- MaxCompute如何對SQL查詢結果實現分頁獲取SQL
- php中對MYSQL操作之批量執行,與獲取批量結果PHPMySql
- pbootcms獲取結果頁面的搜尋keyword值和tag值boot
- Java獲取多執行緒執行結果方式的歸納與總結Java執行緒
- mysql group by 取想要的結果MySql
- 怎麼獲取beego查詢的的結果,Students這個裡面的結果為啥是初始化的?Go
- Vee-validate 父元件獲取子元件表單校驗結果元件
- Oracle 元件資訊獲取途徑整理Oracle元件
- springboot:使用非同步註解@Async獲取執行結果的坑Spring Boot非同步
- 請問PbootCMS獲取結果頁面的搜尋keyword值和tag值boot
- Oracle 獲取整數方式程式碼整理Oracle
- 獲取SD卡序列號和廠商IDSD卡
- eclipse 如何將maven target目錄排除在搜尋結果之外?EclipseMaven
- django不使用序列化器來進行查詢結果序列化Django
- 整理獲取 viewport 和 element 尺寸和位置方法View
- Android 開源庫獲取途徑整理Android
- Android開源庫獲取途徑整理Android
- 多執行緒的補充 獲取一定時間的執行結果執行緒
- PLSQL Language Reference-PL/SQL子程式-PL/SQL函式結果快取-開啟函式結果快取SQL函式快取
- oracle result cache 結果集快取的使用Oracle快取
- Docker+Jenkins+Pipline如何獲取git外掛環境變數(提交sha、分支等)以及Jenkinsfile中獲取sh執行結果(獲取git最近提交資訊)DockerJenkinsGit變數
- 【Django】ajax 非同步重新整理獲取資料Django非同步
- linq返回結果集中增加自增序列,該如何處理
- Java在Linux環境下執行MySQL命令無法獲取結果的問題JavaLinuxMySql
- 如何獲取繫結變數變數