exonerate結果整理,獲取target序列

msw521sg發表於2016-09-17

軟體exonerate輸出的結果如下,想要獲得比對上的target序列

Command line: [./exonerate INPUT/UN029382.fa INPUT/scaffold125532.fa --model est2genome --showtargetgff TRUE --showvulgar no --showalignment yes --alignmentwidth 200 --bestn 1 --verbose 2]
Hostname: [node009]

C4 Alignment:
------------
         Query: UN029382
        Target: scaffold125532 [revcomp]
         Model: est2genome
     Raw score: 6062
   Query range: 0 -> 1336
  Target range: 23867182 -> 23861353

        1 : ATCTGTTGCCCTCGCCCTTCGCAATGGCCTCCTCCTCCTCTGTCTCCCGTCCGCGGAAGCGTCCCGCCGCCGTCGCCTTTTCTTCCTCGCCTCCGCCGCCGTCGCCTTTTCTTCCTCGCCTCCGCCGCCGCCTCAG  >>>> Target Intron 1 >>>>  GGGCTAAGG :      145
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||++         1301 bp         ++|||||||||
 23867182 : ATCTGTTGCCCTCGCCCTTCGCAATGGCCTCCTCCTCCTCTGTCTCCCGTCCGCGGAAGCGTCCCGCCGCCGTCGCCTTTTCTTCCTCGCCTCCGCCGCCGTCGCCTTTTCTTCCTCGCCTCCGCCGCCGCCTCAGgt.........................agGGGCTAAGG : 23865737

      146 : ACTCTGAAATTGACACCAAAGAAGAATTTTCCCCTGATCTGGCGGACCTGTGATGTTCTTCAGCTTTATCTAAAGTCTTTTGGCAGG  >>>> Target Intron 2 >>>>  ACAGCTCGTTTGACGAGTCCAGAGGGACGTCAGCGAGACTACTTTGAGGCAGAGTTCT :      290
            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||++          83 bp          ++||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 23865736 : ACTCTGAAATTGACACCAAAGAAGAATTTTCCCCTGATCTGGCGGACCTGTGATGTTCTTCAGCTTTATCTAAAGTCTTTTGGCAGGgt.........................agACAGCTCGTTTGACGAGTCCAGAGGGACGTCAGCGAGACTACTTTGAGGCAGAGTTCT : 23865509

      291 : TTTTTAAAGAAGAAGCTGAAGATGCATTGCAGAACTGCAAAATCCCAAACATGACCATTGAATGGGCTGAAGCAAACATATCAGACAATCCACTTACAG  >>>> Target Intron 3 >>>>  GACCAGCACAAATTTCGTATGACCCACCAAGGTGTGACTACGATGA :      435
            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||++          74 bp          ++||||||||||||||||||||||||||||||||||||||||||||||
 23865508 : TTTTTAAAGAAGAAGCTGAAGATGCATTGCAGAACTGCAAAATCCCAAACATGACCATTGAATGGGCTGAAGCAAACATATCAGACAATCCACTTACAGgt.........................agGACCAGCACAAATTTCGTATGACCCACCAAGGTGTGACTACGATGA : 23865290

      436 : TTTTAACATTCTGGTAAACAGCTCGAGCACAACTTTTAAATAATTCGCTTATTGGTCTGAAGCAAACATATCAGACAATCCACTTACAGGTAATGATAAGTATAAGTAAATCTTGAGCCTGCTTATTGGTTTCACGAGAAATAATTCGCTTCTGTCAATACAGGACCAGCACAA :      609
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 23865289 : TTTTAACATTCTGGTAAACAGCTCGAGCACAACTTTTAAATAATTCGCTTATTGGTCTGAAGCAAACATATCAGACAATCCACTTACAGGTAATGATAAGTATAAGTAAATCTTGAGCCTGCTTATTGGTTTCACGAGAAATAATTCGCTTCTGTCAATACAGGACCAGCACAA : 23865116

      610 : ATTTCGTATGACCCACCAAGGTGTGACTACGATGATTTTAACATTCTGGTAAACAGCTCGAGCACAACTTTTAAATAATTCGCTTATTGGTCTGAA--GCA-  >>>> Target Intron 4 >>>>  AACATATCAGACAATC--------CACTTACAGGACCAGCAC- :      743
            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |  | | | ||  |  | |  ||| ++          525 bp         ++|||||    | | |||         | ||||  | ||||| | 
 23865115 : ATTTCGTATGACCCACCAAGGTGTGACTACGATGATTTTAACATTCTGGTAAACAGCTCGAGCACAACTTTTAAATTAAGCACATGTTCCTTCGTATTGCATgt.........................agAACATGCATGCCCATCTTTGTAAGAAGTTACCTG-CCAGCTCT : 23864447

      744 : -AAATTTCGTATGACCCACCAAGGTGTGACTACGATGATTT-TAACATTCTGCCATTAGTACCACAGCCACGAAACAATCCTTTTCACATAAAATGGGTATTACCTAAAATGCCGAAAAGACAACAAGGCCAGCCAGAAGAACCTCAATTACCAGCCGCTCGCTATTCCCCTGA :      914
              |||| | | |  ||    ||  | | || |   | |||| |    | | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 23864446 : TGAATTGC-TTTTTCC--TTAAAATTTAAC-ATTTTAATTTGTTTTGTGCAGCCATTAGTACCACAGCCACGAAACAATCCTTTTCACATAAAATGGGTATTACCTAAAATGCCGAAAAGACAACAAGGCCAGCCAGAAGAACCTCAATTACCAGCCGCTCGCTATTCCCCTGA : 23864277

      915 : AAAAGTTAAGGTTGAGCCAGCAGACCCAAGAAAACCGGCCAAGCCGCGGTACTGGCCTAAGTTTCCAATATATCTGCCAATAAAATGACGCCTCGGATGAGAAAGGCTACATCGGCTCGCAGTAAG  >>>> Target Intron 5 >>>>  CTCCAGGAGTAGAAGAATC :     1059
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||++         2501 bp         ++|||||||||||||||||||
 23864276 : AAAAGTTAAGGTTGAGCCAGCAGACCCAAGAAAACCGGCCAAGCCGCGGTACTGGCCTAAGTTTCCAATATATCTGCCAATAAAATGACGCCTCGGATGAGAAAGGCTACATCGGCTCGCAGTAAGgt.........................agCTCCAGGAGTAGAAGAATC : 23861631

     1060 : TTTTGTTGAGAAACAAGACATTCAAGGCTCTCTTTCTCTTGTCGAGAAATAAGACATTCAAGGCTCTCTTTTCTTAAAAGAAAGTGCATTTTTTGTGGAATTGTGGGATTCGTCCCTTCACTACTTTTTTTTGGTAGAGCTGCTGTCTCCTAGAGCTTACTGTGCAATAGACAT :     1233
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 23861630 : TTTTGTTGAGAAACAAGACATTCAAGGCTCTCTTTCTCTTGTCGAGAAATAAGACATTCAAGGCTCTCTTTTCTTAAAAGAAAGTGCATTTTTTGTGGAATTGTGGGATTCGTCCCTTCACTACTTTTTTTTGGTAGAGCTGCTGTCTCCTAGAGCTTACTGTGCAATAGACAT : 23861457

     1234 : GCATGAAGTATTCGTAGTCTTTTTTATTCAAGTTTAGATTTCCAAGCATATATGCTATAGCCTAAAAAAAAACTGGTCGAAATGCAGGTTTGGTCTGTTGTTG :     1336
            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 23861456 : GCATGAAGTATTCGTAGTCTTTTTTATTCAAGTTTAGATTTCCAAGCATATATGCTATAGCCTAAAAAAAAACTGGTCGAAATGCAGGTTTGGTCTGTTGTTG : 23861354

# --- START OF GFF DUMP ---
#
#
##gff-version 2
##source-version exonerate:est2genome 2.2.0
##date 2016-06-22
##type DNA
#
#
# seqname source feature start end score strand frame attributes
#
scaffold125532	exonerate:est2genome	gene	23861354	23867182	6062	-	.	gene_id 0 ; sequence UN029382 ; gene_orientation +
scaffold125532	exonerate:est2genome	utr5	23867047	23867182	.	-	.	
scaffold125532	exonerate:est2genome	exon	23867047	23867182	.	-	.	insertions 0 ; deletions 0
scaffold125532	exonerate:est2genome	splice5	23867045	23867046	.	-	.	intron_id 1 ; splice_site "GT"
scaffold125532	exonerate:est2genome	intron	23865746	23867046	.	-	.	intron_id 1
scaffold125532	exonerate:est2genome	splice3	23865746	23865747	.	-	.	intron_id 0 ; splice_site "AG"
scaffold125532	exonerate:est2genome	utr5	23865650	23865745	.	-	.	
scaffold125532	exonerate:est2genome	exon	23865650	23865745	.	-	.	insertions 0 ; deletions 0
scaffold125532	exonerate:est2genome	splice5	23865648	23865649	.	-	.	intron_id 2 ; splice_site "GT"
scaffold125532	exonerate:est2genome	intron	23865567	23865649	.	-	.	intron_id 2
scaffold125532	exonerate:est2genome	splice3	23865567	23865568	.	-	.	intron_id 1 ; splice_site "AG"
scaffold125532	exonerate:est2genome	utr5	23865410	23865566	.	-	.	
scaffold125532	exonerate:est2genome	exon	23865410	23865566	.	-	.	insertions 0 ; deletions 0
scaffold125532	exonerate:est2genome	splice5	23865408	23865409	.	-	.	intron_id 3 ; splice_site "GT"
scaffold125532	exonerate:est2genome	intron	23865336	23865409	.	-	.	intron_id 3
scaffold125532	exonerate:est2genome	splice3	23865336	23865337	.	-	.	intron_id 2 ; splice_site "AG"
scaffold125532	exonerate:est2genome	utr5	23865014	23865335	.	-	.	
scaffold125532	exonerate:est2genome	exon	23865014	23865335	.	-	.	insertions 3 ; deletions 0
scaffold125532	exonerate:est2genome	splice5	23865012	23865013	.	-	.	intron_id 4 ; splice_site "GT"
scaffold125532	exonerate:est2genome	intron	23864489	23865013	.	-	.	intron_id 4
scaffold125532	exonerate:est2genome	splice3	23864489	23864490	.	-	.	intron_id 3 ; splice_site "AG"
scaffold125532	exonerate:est2genome	utr5	23864151	23864488	.	-	.	
scaffold125532	exonerate:est2genome	exon	23864151	23864488	.	-	.	insertions 11 ; deletions 5
scaffold125532	exonerate:est2genome	splice5	23864149	23864150	.	-	.	intron_id 5 ; splice_site "GT"
scaffold125532	exonerate:est2genome	intron	23861650	23864150	.	-	.	intron_id 5
scaffold125532	exonerate:est2genome	splice3	23861650	23861651	.	-	.	intron_id 4 ; splice_site "AG"
scaffold125532	exonerate:est2genome	exon	23861354	23861649	.	-	.	insertions 0 ; deletions 0
scaffold125532	exonerate:est2genome	similarity	23861354	23867182	6062	-	.	alignment_id 0 ; Query UN029382 ; Align 23867183 1 136 ; Align 23865746 137 96 ; Align 23865567 233 157 ; Align 23865336 390 316 ; Align 23865018 706 3 ; Align 23864489 709 16 ; Align 23864465 725 10 ; Align 23864455 736 7 ; Align 23864446 743 7 ; Align 23864439 751 7 ; Align 23864432 760 12 ; Align 23864420 773 10 ; Align 23864409 783 258 ; Align 23861650 1041 296
# --- END OF GFF DUMP ---
#
-- completed exonerate analysis

程式碼如下

import re
with open('result.exonerate.txt', 'r') as f:
    a =[]
    for num, line in enumerate(f):
        if '|' in line:
            a.append(num + 1)
        if 'Query:' in line:
            print ">" + line.strip().split()[1],
        elif 'Target:' in line:
            print line.strip().split()[1]
        elif num in a:
            b = re.sub(r'[^A-Z]','', line[2:-2])
            print b

修改加強版程式碼

import re
with open('result.exonerate.txt', 'r') as f:
    a =[]
    for num, line in enumerate(f):
        if 'Query:' in line:
            b = []
            d = []
            print ">" + line.strip().split()[1],
        elif 'Target:' in line:
            print line.strip().split()[1],
        elif '|' in line:
            a.append(num + 1)
            b.append(line.count('|'))
        elif 'Query range:' in line:
            print int(line.strip().split()[-1]) - int(line.strip().split()[-3]),
        elif num in a:
            c = re.sub(r'[^A-Z]','', line[2:-2])
            d.append(c)
        elif 'completed exonerate analysis' in line:
            count = sum(b)
            print count
            print ''.join(i for i in d)

相關文章