通過PAML中的CODEML模組計算dnds的過程以及踩坑

chestnut_egg發表於2020-05-15

原文網址 : https://www.cnblogs.com/chestnut-egg/p/12894987.html

最近幫女朋友做畢業設計的時候用到了 PAML這個軟體的codeml功能，發現網上相關的資料很少，於是把自己踩的一些坑分享一下，希望能幫到其他有相同困難的人

一、下載與安裝

PAML軟體下載地址

http://abacus.gene.ucl.ac.uk/software/paml4.9j.tgz

DAMBE軟體下載地址

http://dambe.bio.uottawa.ca/DAMBE/dambe_install_win.aspx

二、使用方法

首先準備好你的fas檔案

我們需要將fas檔案轉換一下格式，方法很多，我這邊說兩種方法，這兩種方法最後得到的檔案內容完全相同，只是字尾名不同

方法一：

使用python指令碼轉換

將你的*.fas檔案與指令碼放在同一目錄下，執行指令碼，會生成一個.phy檔案

import re
with open('seven.fas', 'r') as fin:
    sequences = [(m.group(1), ''.join(m.group(2).split()))
                 for m in re.finditer(r'(?m)^>([^ \n]+)[^\n]*([^>]*)', fin.read())]
with open('seven.phy', 'w') as fout:
    fout.write('%d %d\n' % (len(sequences), len(sequences[0][1])))
    for item in sequences:
        fout.write('%-20s %s\n' % item)

方法二：

使用DAMBE軟體轉換格式

1.開啟DAMBE，選擇 File -> Open standard sequence file -> 檔案型別選擇為包含 fas 型別 -> 選擇你的fas檔案

2.點選 go

3.點選 File -> save or convert sequence format -> 選擇 paml 格式

4.手動修改 *.pml 的字尾名為 *.nuc

通過以上兩個方法會得到一份 *.phy 或者 *.nuc 檔案

接下來需要去除序列中的終止密碼子

TAG,TAA,TGA

你可以全選檔案內容查詢替換將 TAG/TAA/TGA 替換為 ---

也可以使用下面這個python指令碼

import re

with open(r'seven.phy', 'r') as f:
    content = f.read()
    content = content.replace("TAG","---")
    content = content.replace("TAA", "---")
    content = content.replace("TGA", "---")
    # print(content)

with open('sevenend.phy', 'w') as f:
    f.write(content)

會生成一個去除過終止密碼子的檔案

現在將這個處理過後的序列檔案*.phy與樹檔案、配置檔案codeml.ctl三個放在 \paml4.9j\bin 目錄下

配置檔案codeml.ctl內容如下可參考一般修改前面三行即可按順序為序列檔名樹檔名輸出檔名

      seqfile = seven.nuc * sequence data filename
     treefile = Newick      * tree structure file name
      outfile = test.txt           * main result file name

        noisy = 0  * 0,1,2,3,9: how much rubbish on the screen
      verbose = 0  * 0: concise; 1: detailed, 2: too much
      runmode = -2  * 0: user tree;  1: semi-automatic;  2: automatic
                   * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

      seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs
    CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table

*        ndata = 5504
        clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
       aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
   aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
                   * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own

        model = 0
                   * models for codons:
                       * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
                   * models for AAs or codon-translated AAs:
                       * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
                       * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)

      NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
                   * 5:gamma;6:2gamma;7:beta;8:beta&w;9:beta&gamma;
                   * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
                   * 13:3normal>0

        icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below
        Mgene = 0
                   * codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff
                   * AA: 0:rates, 1:separate

    fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated
        kappa = 2  * initial or fixed kappa
    fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate 
        omega = .4 * initial or fixed omega, for codons or codon-based AAs

    fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha
        alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)
       Malpha = 0  * different alphas for genes
        ncatG = 8  * # of categories in dG of NSsites models

        getSE = 0  * 0: don't want them, 1: want S.E.s of estimates
 RateAncestor = 1  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

   Small_Diff = .5e-6
    cleandata = 1  * remove sites with ambiguity data (1:yes, 0:no)?
*  fix_blength = 1  * 0: ignore, -1: random, 1: initial, 2: fixed, 3: proportional
       method = 0  * Optimization method 0: simultaneous; 1: one branch a time

* Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,
* 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt., 
* 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt., 
* 10: blepharisma nu.
* These codes correspond to transl_table 1 to 11 of GENEBANK.

在此目錄下開啟命令列

輸入一下命令即可

codeml

當前目錄下就會出現結果檔案 test.txt 以及其他檔案了

在過程中我遇到過許多報錯提供給大家參考一下

67 columns are converted into ??? because of stop codons

這個報錯是因為沒有去除檔案中的終止密碼子，可以參考上面的步驟去除

Error: Error in sequence data file: . in 1st seq.?.

Error: check #seqs and tree: perhaps too many '('?.

Make sure to separate the sequence from its name by 2 or more spaces.

以上報錯均為你的序列檔案內容/格式有問題，麻煩按照上面的步驟重新生成序列檔案或者參考其他人的檔案格式

JasperReport 中踩過的坑
2018-08-13
vue-element-admin 使用過程中踩坑
2019-11-15
Vue
GeoServer 踩過的坑
2018-03-13
Server
Charles在windows下的安裝以及踩過的坑
2020-12-05
Windows
vue專案中踩過的element的坑
2021-08-22
Vue
PX4配置過程與踩坑
2022-05-31
內購支付踩過的坑以及自己的解決途徑
2018-04-03
使用ABP框架中踩過的坑系列2
2018-06-08
框架
IDEA建立Maven專案中踩過的坑
2020-11-06
IdeaMaven
記錄自己在tensorflow中踩過的坑
2020-10-16
解析資料踩過的坑
2018-10-08
你踩過flutter的坑嗎
2019-04-22
Flutter
親自踩過的vue的坑
2018-03-14
Vue
Android HAL模組的載入過程
2018-12-18
Android
前端模組化的演變過程
2021-02-04
前端
wepy小程式踩過的坑(1)
2018-08-07
安裝 Laravel/horizon 踩過的坑
2019-09-28
Laravel
Compose 延遲列表踩過的坑
2024-06-27
uniapp之那些年踩過的坑
2020-12-04
APP
win10的pycharm中安裝ansible模組過程
2021-11-20
Win10PyCharm
torch.einsum 的計算過程
2024-08-09
小白程式設計師最容易踩的“坑”，你踩過幾個？
2020-02-12
程式設計師
【Node】詳解模組的實現過程
2019-03-08
8.7 一個模組的封裝過程
2020-10-28
封裝
[轉]使用ts-node執行ts指令碼以及踩過的坑
2024-10-12
指令碼
OAuth 2.0以及它的工作過程工作過程
2024-10-21
OAuth
面試中的這些坑，你踩過幾個？
2018-09-10
面試
Apache DolphinScheduler 1.3.4升級至3.1.2版本過程中的踩坑記錄
2024-08-02
Apache
將自己的站點升級成HTTPS的（瘋狂踩坑）過程
2019-03-05
HTTP
mpvue & 小程式開發過程中的坑
2019-03-04
Vue
Redis 叢集部署及踩過的坑
2018-03-13
Redis
踩過的坑（一）——web容器升級
2024-08-19
Web
linux環境壓測踩過的坑
2021-01-05
Linux
SAP Commerce Cloud ASM 模組的登入過程
2022-09-02
CloudASM
Flutter 接入iOS蘋果內購支付踩坑過程
2021-08-06
FlutteriOS蘋果
Swoole 中通過 process 模組實現多程式
2019-12-19
關於最近開發小程式中踩過的那些坑
2018-09-04
移遠 EC20 模組（4G通訊模組）AT指令測試 TCP 通訊過程
2020-09-29
TCP

通過PAML中的CODEML模組計算dnds的過程以及踩坑

相關文章