ubuntu下使用sratoolkit將sra檔案轉換成fastq檔案
ubuntu下使用sratoolkit將sra檔案轉換成fastq檔案:
環境:ubuntu14.04
sratoolkit.2.5.5-ubuntu64
1.下載
下載地址:
http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software#
2.將sra轉換成fastq:
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ../../code/sratoolkit.2.5.5-ubuntu64/bin/fastq-dump SRR003161
<pre name="code" class="plain">hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ls
SRR002664.fastq SRR002664.sra SRR003161.fastq SRR003161.sra
資料檔案請見:http://blog.csdn.net/xubo245/article/details/50507222
3.檢視fastq:
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ more SRR003161.fastq
@SRR003161.1 FEKQ5UX01AS5XC length=124
TCAGATGCAATCATCGAATGGTCTCGAATGGAATCNTCTANAGAGATGGAATGTATCNCTCGCCANACGACACNCGAACAGGGNAAGGCAAGCAGNAGGNAGNNNANNNNNNNNNNNNNNNNNN
+SRR003161.1 FEKQ5UX01AS5XC length=124
AAAAAAAAAAAAAAAA:::BAAFAABAAB?>>=44!39=<!:866699888220862!08:8002!0200000!022200800!20660000600!000!06!!!6!!!!!!!!!!!!!!!!!!
@SRR003161.2 FEKQ5UX01AOE96 length=505
TCAGTTTGAGATGGAGTTTCATTCTTGTTGCCCAGGCTGGAGTGCAATGGCGCAATCTCAGCTCACAGCAACCTCCGCCTCCCGGGTTCAAGCGATTCTCCTGCCTCAGCCTCTCGAGTAGCTGGGATTACAGGCATGCACCATCACGCCCAGCTAATTTGCATTTTTTATTAGAGATGGGGTTTCTCCAC
ATTGGTCAGGCTGATCTCGAACTCCTGACCTCAGGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCTGAGCCCAACCTATTTACTTTCAATCCATCTTTTCAATAACTTAAATACAAGTGTCAATATATACAATCTTTTCCTCCCTGGTTATCAAGCTTTCTAATATATATG
GATGTATCTTCCAAGGTTTTTGATCCCATTTTACTTTACAGGCTCACTGCTGTGGAACCCAGAGAGCAGTCTCTTTTCAAGGNGGGCTGAGACNCGCAACAGGGGATTAGGCCAAGGCNCAGG
+SRR003161.2 FEKQ5UX01AOE96 length=505
CCCCCCCCCCCCCCCC@@@CCCFEEEFEEG888EEEFFEEEEFGGGGGGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA<777@@CCCBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAACCCCCCCCCCCCCCCCCCCCCCC:93339@A>77//39AC666666C22CAAAA93333///7-0017
>9999>>A???ACCCCCCC2239322>9977<?????CCCCCCCCC877777777111111::::5555:555:::::::::;:555:;;::::0040-----***--467::::;;;;;;:::511155555:555:::;::::::7777744-------///245::;;;::::::;;;;;;;;:5555
4774----------44-----064---------6---522451115247644255-----,4---24464422---------!,,,4464224!11:::7:::111111--7777---!----
@SRR003161.3 FEKQ5UX01ARXN7 length=645
TCAGCATGCTAGACAGAAGAATTCTCAGTAACTTTCTTTGTGCTGTGTGTATTCAACTCACAGAGTTGGAACCGTTCCTTTGTCAACAGAGCTAGAATTTGAAACCNCTCTTGAGGACTACGCGAAANAGGGGANAAGGTCCAAAGGCCAGTANAGGGNTCGGANGTANAAGATNCTNAAAATAAAACNGA
NAGAATCATTCTNAAGAAACTTNTTGNATGTNTGCCCTTTCAAACTCAACAGGAGTTTACCAAACCTTTTCTTTTCTAAAGGAGACTAAGGTTTTAAGAAAACCACTTACTCGGTCTTTGGTTAATGTCTGCAAAGGTGGATTATTGGACCTTCTTGAGGTCCCTTTCGTTGCGTAAAACCGGGGTTTCTT
CCTTTCACTTAGTCGTACGTAACGTAAACGTAAAAGGTAAAGGTTACGTTACGTTAACGTTTAAACGTTTTTTTAACGTTTTGGTTTGGTTTGGTTGTTAGTTTACTTAACCTTAACCTAACCTAAACGTAAAGGTTTAACGGTTAAACCGTTAACGTTACGTTTAACGTTAAGGTAAGGAAGGACGAGTA
AGTTAAGTTAAACTAAACTACTAGTAGACGACGACAACGAAGGAGAGAGAGACGACACGAGGAGGAGNGNNN
+SRR003161.3 FEKQ5UX01ARXN7 length=645
AAAAAAAAAAAAAAAAAAAAAAIFAABA?7792222.,,:3<<<<:0222276:220::20020028662222022000002,220006666=9000669600000!0699788...4877873...!,.333.!......4447........!....!....4!...!..66.!..!....4+++*.!..
!.33333686--!---------!--3!332,!,,,,,,,,*,,,,2,,,,,,,,,2,,,,,,,,,,,,.,,((((,(,,,,,),,,,,,,,,,..000----,,(,,,,,,,,,,,,,,,,)),,,,,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,),,,1..,,,,,,,,,,,,))
,,,,,,,,,,,,,,,,,,,03330,,,,,,,)))),,0(((,,,,,100,,,,,,,,0,,,,,,-03,----)))),,'''',,(((,,))),,)),,,,,,,,,,))00,,,,,,,,000,,,,,,,,,))),,,,,)),,,)),,,,,0,,)),,11133-,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,-,,))),,),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10000,,,,,,,,,,!,!!!
@SRR003161.4 FEKQ5UX01AMUAT length=587
TCAGGTTTGGAATGTGGGCTCTGAAGCCATACAACACAGTTTCTACTCTTTATCTTACACCTCCTGACTTTGTGACATTGGTTAAATATTTTATTTATTATNNCATAACTTACTACTTTGTTAAATTAGAAGTACGACTGTCTACACTCTTAGGTAGTTGGTCTGTTGAAATTAAATAATAGNACTTTAAC
TTACTTAAATAGANATACACACGACTTAGTTAGTTGTTGGCTGGAAATTAGGTATNTGTTTTAGTTCCTACACCTTACTTAACCCTAACCTACCATNTAATACTTTTACTTGTTCTCNGANANATNATAGTNTCTACGTTGAGTATATTACTTATATTACACGGTACGACGGACCGACGTCGTACACGTCT
CGTCTTCTNCNANNATGTAGTGAGTCTNTTTATTNTTTCTTAACTACTACTACTCGTTGTAGTAAGTAATAATAANTNNTCTACACCTACGACTGTATTGTAAGTACAAGAAGGACCGACGTTTCGTTACCTTTCTTCTTCGTCCTCTACTTAACCTGTTACTACGTACGCGAACACGGACGTAGGAGGAG
GAGGACACGAACGG
+SRR003161.4 FEKQ5UX01AMUAT length=587
AAAAAAAAAAAAAAAAAAAAAAIEEAIIIIIIAAIIIA:666AAE???<<<@AA===A=>>AAAAAAAAAAAAA?@???980000040....0/**04490!!00000600.........,,.....,.....74..............33.....7.....4..............++664!.000000.
135855----*--!3------------33,,,,,,,,,2222222,,,,*,,,,,!,,,,,,3,((,,,,00,,,,,,,,,,,,,1,,)),,,,01!333001,,,,03((,,,,,,!,,!,!,,!,,3,,!,1,,,,,,,,,,,,,,,,,,,,,,,,3,,,,,433,,,,,,,,13,,,,,,,,,04,,,
,,,,,,,,!,!,!!,,,,10,,,311,!,,,1))!,,,,)),,30,,,0330,,,,,,003333,,,,,0003,,!,!!,,01,,,033,,,,,1,,,,,,,,00,,,,,,,,,1331313/.,,,)),,,,,,,)),,,,,,,,,,010,,,,,,,,,,3303,,,,0000000,,,,03,,,,,0,,,,
,,,,34333,,,,,
4.sra轉換成fasta:
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ../../code/sratoolkit.2.5.5-ubuntu64/bin/fastq-dump --fasta 20 SRR003161
2016-01-13T05:33:42 fastq-dump.2.5.5 err: timeout exhausted while reading file within network system module - failed SRR003161
=============================================================
An error occurred during processing.
A report was generated into the file '/home/hadoop/ncbi_error_report.xml'.
If the problem persists, you may consider sending the file
to 'sra@ncbi.nlm.nih.gov' for assistance.
=============================================================
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ more SRR003161.fasta
>SRR003161.1 FEKQ5UX01AS5XC length=124
TCAGATGCAATCATCGAATG
GTCTCGAATGGAATCNTCTA
NAGAGATGGAATGTATCNCT
CGCCANACGACACNCGAACA
GGGNAAGGCAAGCAGNAGGN
AGNNNANNNNNNNNNNNNNN
NNNN
>SRR003161.2 FEKQ5UX01AOE96 length=505
TCAGTTTGAGATGGAGTTTC
ATTCTTGTTGCCCAGGCTGG
AGTGCAATGGCGCAATCTCA
GCTCACAGCAACCTCCGCCT
CCCGGGTTCAAGCGATTCTC
CTGCCTCAGCCTCTCGAGTA
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ../../code/sratoolkit.2.5.5-ubuntu64/bin/fastq-dump --fasta 50 SRR003161
2016-01-13T05:36:52 fastq-dump.2.5.5 err: timeout exhausted while reading file within network system module - failed SRR003161
=============================================================
An error occurred during processing.
A report was generated into the file '/home/hadoop/ncbi_error_report.xml'.
If the problem persists, you may consider sending the file
to 'sra@ncbi.nlm.nih.gov' for assistance.
=============================================================
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ls
SRR002664.fastq SRR002664.sra SRR003161.fasta SRR003161.fastq SRR003161.sra
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ more SRR003161.fasta
>SRR003161.1 FEKQ5UX01AS5XC length=124
TCAGATGCAATCATCGAATGGTCTCGAATGGAATCNTCTANAGAGATGGA
ATGTATCNCTCGCCANACGACACNCGAACAGGGNAAGGCAAGCAGNAGGN
AGNNNANNNNNNNNNNNNNNNNNN
>SRR003161.2 FEKQ5UX01AOE96 length=505
TCAGTTTGAGATGGAGTTTCATTCTTGTTGCCCAGGCTGGAGTGCAATGG
CGCAATCTCAGCTCACAGCAACCTCCGCCTCCCGGGTTCAAGCGATTCTC
CTGCCTCAGCCTCTCGAGTAGCTGGGATTACAGGCATGCACCATCACGCC
CAGCTAATTTGCATTTTTTATTAGAGATGGGGTTTCTCCACATTGGTCAG
GCTGATCTCGAACTCCTGACCTCAGGTGATCTGCCTGCCTTGGCCTCCCA
AAGTGCTGGGATTACAGGCATGAGCCTGAGCCCAACCTATTTACTTTCAA
TCCATCTTTTCAATAACTTAAATACAAGTGTCAATATATACAATCTTTTC
CTCCCTGGTTATCAAGCTTTCTAATATATATGGATGTATCTTCCAAGGTT
TTTGATCCCATTTTACTTTACAGGCTCACTGCTGTGGAACCCAGAGAGCA
GTCTCTTTTCAAGGNGGGCTGAGACNCGCAACAGGGGATTAGGCCAAGGC
NCAGG
>SRR003161.3 FEKQ5UX01ARXN7 length=645
TCAGCATGCTAGACAGAAGAATTCTCAGTAACTTTCTTTGTGCTGTGTGT
ATTCAACTCACAGAGTTGGAACCGTTCCTTTGTCAACAGAGCTAGAATTT
GAAACCNCTCTTGAGGACTACGCGAAANAGGGGANAAGGTCCAAAGGCCA
GTANAGGGNTCGGANGTANAAGATNCTNAAAATAAAACNGANAGAATCAT
TCTNAAGAAACTTNTTGNATGTNTGCCCTTTCAAACTCAACAGGAGTTTA
CCAAACCTTTTCTTTTCTAAAGGAGACTAAGGTTTTAAGAAAACCACTTA
暫時沒解決err、、、
換個資料集就可以了,
成功的:faste 50 為每行50個鹼基
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ../../code/sratoolkit.2.5.5-ubuntu64/bin/fastq-dump --fasta 50 SRR002664
Read 487522 spots for SRR002664
Written 487522 spots for SRR002664
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ls
back SRR002664.fasta SRR002664.sra SRR003161.fasta SRR003161.fastq SRR003161.sra
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ll -h
total 986M
drwxrwxr-x 3 hadoop hadoop 4.0K 1月 13 13:40 ./
drwxrwxr-x 5 hadoop hadoop 4.0K 1月 12 21:31 ../
drwxrwxr-x 2 hadoop hadoop 4.0K 1月 13 13:39 back/
-rw-rw-r-- 1 hadoop hadoop 150M 1月 13 13:40 SRR002664.fasta
-rw-r--r-- 1 hadoop hadoop 17M 12月 15 22:13 SRR002664.sra
-rw-rw-r-- 1 hadoop hadoop 274M 1月 13 13:36 SRR003161.fasta
-rw-rw-r-- 1 hadoop hadoop 538M 1月 13 13:00 SRR003161.fastq
-rw-r--r-- 1 hadoop hadoop 9.0M 12月 15 23:12 SRR003161.sra
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ more SRR002664.fasta
>SRR002664.1 FC20KVN01EFCX9 length=192
TCAGCTCACGTCTGTAATCCTAGCATTTTGGGAGGCTGAGACGGGCAGAT
CACTTGAGGTCATGAGTTCGAGACCAGCCTGGCAACCATGGCGAAACCCT
GTCTCTACTAAAATACAAAATTAGCCAGGCATGGTGGCGCATGCCTGTCT
GAGACACGCAACAGGGGATAGGCAAGGCACACAGGGGATAGG
>SRR002664.2 FC20KVN01ELL46 length=127
TCAGCAAAGAAAACAAATTCCTTTCTGGCACCACCTCAAAGAAGAATTTC
在用fastq驗證:
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ../../code/sratoolkit.2.5.5-ubuntu64/bin/fastq-dump SRR002664
Read 487522 spots for SRR002664
Written 487522 spots for SRR002664
5.split
將雙端測序檔案分開
(1)split-files生成兩個fastq檔案
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ../../code/sratoolkit.2.5.5-ubuntu64/bin/fastq-dump --split-files SRR002664
Read 487522 spots for SRR002664
Written 487522 spots for SRR002664
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ll -h
total 924M
drwxrwxr-x 3 hadoop hadoop 4.0K 1月 13 14:05 ./
drwxrwxr-x 5 hadoop hadoop 4.0K 1月 12 21:31 ../
drwxrwxr-x 2 hadoop hadoop 4.0K 1月 13 13:52 back/
-rw-rw-r-- 1 hadoop hadoop 44M 1月 13 14:05 SRR002664_1.fastq
-rw-rw-r-- 1 hadoop hadoop 291M 1月 13 14:05 SRR002664_2.fastq
-rw-rw-r-- 1 hadoop hadoop 291M 1月 13 14:02 SRR002664.fastq
-rw-r--r-- 1 hadoop hadoop 17M 12月 15 22:13 SRR002664.sra
-rw-rw-r-- 1 hadoop hadoop 274M 1月 13 13:56 SRR003161.fasta
-rw-r--r-- 1 hadoop hadoop 9.0M 12月 15 23:12 SRR003161.sra
(2)--split-3
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ../../code/sratoolkit.2.5.5-ubuntu64/bin/fastq-dump --split-3 SRR002664
Rejected 487522 READS because of filtering out non-biological READS
Read 487522 spots for SRR002664
Written 487522 spots for SRR002664
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ll
total 1192100
drwxrwxr-x 3 hadoop hadoop 4096 1月 13 14:21 ./
drwxrwxr-x 5 hadoop hadoop 4096 1月 12 21:31 ../
drwxrwxr-x 2 hadoop hadoop 4096 1月 13 14:21 back/
-rw-rw-r-- 1 hadoop hadoop 304893796 1月 13 14:21 SRR002664.fastq
-rw-r--r-- 1 hadoop hadoop 16874064 12月 15 22:13 SRR002664.sra
-rw-rw-r-- 1 hadoop hadoop 42893052 1月 13 14:16 SRR003161_1.fastq
-rw-rw-r-- 1 hadoop hadoop 559892770 1月 13 14:16 SRR003161_2.fastq
-rw-rw-r-- 1 hadoop hadoop 286773153 1月 13 13:56 SRR003161.fasta
-rw-r--r-- 1 hadoop hadoop 9353980 12月 15 23:12 SRR003161.sra
對於–split-3引數,是這樣介紹的:
Legacy 3-file splitting for mate-pairs: first biological reads satisfying dumping conditions are placed in files *_1.fastq and *_2.fastq If only one biological read is present it is placed in *.fastq. Biological reads and above are ignored
也就是說如果SRA檔案中只有一個檔案,那麼這個引數就會被忽略。如果原檔案中有兩個檔案,那麼它就會把成對的檔案按*_1.fastq, *_2.fastq這樣分開。如果還有出現了第三個檔案,就意味著這個檔案本身是未成配對的部分。可能是當初提交的時候因為事先過濾過了一下,所以有一部分資料被刪除了
借鑑參考【4】
(3)--split-spot
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ../../code/sratoolkit.2.5.5-ubuntu64/bin/fastq-dump --split-spot SRR002664
Read 487522 spots for SRR002664
Written 487522 spots for SRR002664
hadoop@Mcnode1:~/cloud/adam/down/data/SRA$ ll
total 1236636
drwxrwxr-x 3 hadoop hadoop 4096 1月 13 14:53 ./
drwxrwxr-x 5 hadoop hadoop 4096 1月 12 21:31 ../
drwxrwxr-x 2 hadoop hadoop 4096 1月 13 14:21 back/
-rw-rw-r-- 1 hadoop hadoop 350498654 1月 13 14:54 SRR002664.fastq
-rw-r--r-- 1 hadoop hadoop 16874064 12月 15 22:13 SRR002664.sra
-rw-rw-r-- 1 hadoop hadoop 42893052 1月 13 14:16 SRR003161_1.fastq
-rw-rw-r-- 1 hadoop hadoop 559892770 1月 13 14:16 SRR003161_2.fastq
-rw-rw-r-- 1 hadoop hadoop 286773153 1月 13 13:56 SRR003161.fasta
-rw-r--r-- 1 hadoop hadoop 9353980 12月 15 23:12 SRR003161.sra
--split-spot | Split spots into individual reads. |
參考:
【1】 http://www.ncbi.nlm.nih.gov/Traces/sra/?view=toolkit_doc&f=fastq-dump
【2】 http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc
【3】 http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software#
【4】 http://www.bbioo.com/lifesciences/40-112832-1.html
相關文章
- window下使用sratoolkit將sra檔案轉換成fastqAST
- 使用sratoolkit轉換SRA檔案格式
- Linux下把sra檔案轉成fastq檔案LinuxAST
- SRA資料轉成fastqAST
- 本地blast的使用及SRA轉fastq,解決sra轉換成fastq後bwa無法識別的問題AST
- ncbi下載資料sra和轉換fastq流程AST
- sra 資料轉成 fastq並改名AST
- WOR檔案轉換成GST檔案
- sra轉fastq格式AST
- FastQ檔案格式AST
- 把 .xyz 檔案轉換成 .ply 檔案
- Linux下批量將md檔案轉換為html檔案LinuxHTML
- 將Schema檔案轉換為Java檔案Java
- 如何用fastq-dump把sra格式轉成fastq格式(fq格式)AST
- 如何將.ipynb檔案轉換為.py檔案
- 如何將檔案PDF格式轉換成Word格式
- 用Python將word檔案轉換成htmlPythonHTML
- WPS演示將演示文件轉換成Flash檔案
- ofd檔案如何轉換成pdf格式 電腦上ofd檔案如何轉換成pdf格式
- sra檔案下載及解析的問題
- Win10系統下將excel檔案轉換為dbf檔案的方法Win10Excel
- aspose word轉換pdf檔案後將pdf檔案轉換為圖片png
- jar檔案換成exe檔案問題?JAR
- caj檔案怎麼轉換成word文件,簡單的檔案格式轉換教程
- 將多個檔案壓縮成zip檔案進行下載
- Oracle使用RMAN將普通資料檔案轉成ASMOracleASM
- ubuntu下解壓檔案命令大全(轉)Ubuntu
- Python:將utf-8格式的檔案轉換成gbk格式的檔案Python
- PDF檔案轉換為DWF檔案
- 將bmp檔案轉換成JPEG(待修改,目前可轉換但圖片倒轉)
- 使用UiBot實現批次html轉換成PDF檔案UIHTML
- 自動將視訊檔案轉換成音訊檔案,mp4轉mp3格式音訊
- ubuntu下檔案複製Ubuntu
- ubuntu下修改host檔案Ubuntu
- ubuntu下修改hosts檔案Ubuntu
- ubuntu 下修改host檔案Ubuntu
- heic檔案怎麼線上轉換成jpg?
- Caj檔案怎樣整篇轉換成Word