R統計繪圖 - 熱圖簡化

weixin_33806914發表於2018-04-29

歡迎關注天下部落格:http://blog.genesino.com/2017/06/heatmap-simple/
Jump to...

  1. 熱圖繪製 - pheatmap
    1. 不改指令碼的熱圖繪製
    2. 生信寶典,一起換個角度學生信

熱圖繪製 - pheatmap

繪製熱圖除了使用ggplot2,還可以有其它的包或函式,比如pheatmap::pheatmap (pheatmap包中的pheatmap函式)、gplots::heatmap.2等。

相比於ggplot2作heatmap, pheatmap會更為簡單一些,一個函式設定不同的引數,可以完成行列聚類、行列註釋、Z-score計算、顏色自定義等。那我們來看看效果怎樣。

data_ori <- "Grp_1;Grp_2;Grp_3;Grp_4;Grp_5
a;6.6;20.9;100.1;600.0;5.2
b;20.8;99.8;700.0;3.7;19.2
c;100.0;800.0;6.2;21.4;98.6
d;900;3.3;20.3;101.1;10000"

data <- read.table(text=data_ori, header=T, row.names=1, sep=";", quote="")

  Grp_1 Grp_2 Grp_3 Grp_4   Grp_5
a   6.6  20.9 100.1 600.0     5.2
b  20.8  99.8 700.0   3.7    19.2
c 100.0 800.0   6.2  21.4    98.6
d 900.0   3.3  20.3 101.1 10000.0

pheatmap::pheatmap(data, filename="pheatmap_1.pdf")

雖然有點醜,但一步就出來了。

7071112-752f96f58b5fbeab.png
image

heatmap美化篇提到的資料前期處理方式,都可以用於pheatmap的畫圖。此外Z-score計算在pheatmap中只要一個引數就可以實現。

pheatmap::pheatmap(data, scale="row", filename="pheatmap_1.pdf")

7071112-021faecc2bf7b588.png
image

有時可能不需要行或列的聚類,原始展示就可以了。

pheatmap::pheatmap(data, scale="row", cluster_rows=FALSE, cluster_cols=FALSE, filename="pheatmap_1.pdf")

7071112-25c9a5b2643fa976.png
image

給矩陣 (data)中行和列不同的分組註釋。假如有兩個檔案,第一個檔案為行註釋,其第一列與矩陣中的第一列內容相同 (順序沒有關係),其它列為第一列的不同的標記,如下面示例中(假設行為基因,列為樣品)的2,3列對應基因的不同型別 (TF or enzyme)和不同分組。第二個檔案為列註釋,其第一列與矩陣中第一行內容相同,其它列則為樣品的註釋。

row_anno = data.frame(type=c("TF","Enzyme","Enzyme","TF"), class=c("clu1","clu1","clu2","clu2"), row.names=rownames(data))
row_anno

    type class
a     TF  clu1
b Enzyme  clu1
c Enzyme  clu2
d     TF  clu2

col_anno = data.frame(grp=c("A","A","A","B","B"), size=1:5, row.names=colnames(data))
col_anno

      grp size
Grp_1   A    1
Grp_2   A    2
Grp_3   A    3
Grp_4   B    4
Grp_5   B    5

pheatmap::pheatmap(data, scale="row", 
cluster_rows=FALSE, 
annotation_col=col_anno,
annotation_row=row_anno,
filename="pheatmap_1.pdf")

7071112-5efc1986f1a86b23.png
image.png

自定義下顏色吧。

# <bias> values larger than 1 will give more color for high end. 
# Values between 0-1 will give more color for low end.
pheatmap::pheatmap(data, scale="row", 
cluster_rows=FALSE, 
annotation_col=col_anno,
annotation_row=row_anno,
color=colorRampPalette(c('green','yellow','red'), bias=1)(50),
filename="pheatmap_1.pdf")

7071112-c3cc3f963f47f158.png
image

heatmap.2的使用就不介紹了,跟pheatmap有些類似,而且也有不少教程。

不改指令碼的熱圖繪製

繪圖時通常會碰到兩個頭疼的問題:

  1. 需要畫很多的圖,唯一的不同就是輸出檔案,其它都不需要修改。如果用R指令碼,需要反覆替換檔名,繁瑣又容易出錯。

  2. 每次繪圖都需要不斷的調整引數,時間久了不用,就忘記引數放哪了;或者調整次數過多,有了很多版本,最後不知道用哪個了。

為了簡化繪圖、維持指令碼的一致,我用bashR做了一個封裝,然後就可以通過修改命令好引數繪製不同的圖了。

先看一看怎麼使用

首先把測試資料儲存到檔案中方便呼叫。資料矩陣儲存在heatmap_data.xls檔案中;行註釋儲存在heatmap_row_anno.xls檔案中;列註釋儲存在heatmap_col_anno.xls檔案中。

# tab鍵分割,每列不加引號
write.table(data, file="heatmap_data.xls", sep="\t", row.names=T, col.names=T,quote=F)
# 如果看著第一行少了ID列不爽,可以填補下
system("sed -i '1 s/^/ID\t/' heatmap_data.xls")

write.table(row_anno, file="heatmap_row_anno.xls", sep="\t", row.names=T, col.names=T,quote=F)
write.table(col_anno, file="heatmap_col_anno.xls", sep="\t", row.names=T, col.names=T,quote=F)

然後用程式sp_pheatmap.sh繪圖。

# -f: 指定輸入的矩陣檔案
# -d:指定是否計算Z-score,<none> (否), <row> (按行算), <col> (按列算)
# -P: 行註釋檔案
# -Q: 列註釋檔案
ct@ehbio:~/$ sp_pheatmap.sh -f heatmap_data.xls -d row -P heatmap_row_anno.xls -Q heatmap_col_anno.xls

一個回車就得到了下面的圖

7071112-8f22e9b3961be717.png
image

字有點小,是因為圖太大了,把圖的寬和高縮小下試試。

# -f: 指定輸入的矩陣檔案
# -d:指定是否計算Z-score,<none> (否), <row> (按行算), <col> (按列算)
# -P: 行註釋檔案
# -Q: 列註釋檔案
# -u: 設定寬度,單位是inch
# -v: 設定高度,單位是inch
ct@ehbio:~/$ sp_pheatmap.sh -f heatmap_data.xls -d row -P heatmap_row_anno.xls -Q heatmap_col_anno.xls -u 8 -v 12

7071112-cf4cdd1e3f8eae5c.png
image

橫軸的標記水平放置

# -A: 0, X軸標籤選擇0度
# -C: 自定義顏色,注意引號的使用,最外層引號與內層引號不同,引號之間無交叉
# -T: 指定給定的顏色的型別;如果給的是vector (如下面的例子), 則-T需要指定為vector; 否則結果會很怪異,只有倆顏色。
# -t: 指定圖形的題目,注意引號的使用;引數中包含空格或特殊字元等都要用引號引起來作為一個整體。
ct@ehbio:~/$ sp_pheatmap.sh -f heatmap_data.xls -d row -P heatmap_row_anno.xls -Q heatmap_col_anno.xls -u 8 -v 12 -A 0 -C 'c("white", "blue")' -T vector -t "Heatmap of gene expression profile" 

7071112-28de69147d4553d8.png
image

sp_pheatmap.sh的引數還有一些,可以完成前面講述過的所有熱圖的繪製,具體如下:

***CREATED BY Chen Tong (chentong_biology@163.com)***

----Matrix file--------------
Name    T0_1    T0_2    T0_3    T4_1    T4_2
TR19267|c0_g1|CYP703A2  1.431   0.77    1.309   1.247   0.485
TR19612|c1_g3|CYP707A1  0.72    0.161   0.301   2.457   2.794
TR60337|c4_g9|CYP707A1  0.056   0.09    0.038   7.643   15.379
TR19612|c0_g1|CYP707A3  2.011   0.689   1.29    0   0
TR35761|c0_g1|CYP707A4  1.946   1.575   1.892   1.019   0.999
TR58054|c0_g2|CYP707A4  12.338  10.016  9.387   0.782   0.563
TR14082|c7_g4|CYP707A4  10.505  8.709   7.212   4.395   6.103
TR60509|c0_g1|CYP707A7  3.527   3.348   2.128   3.257   2.338
TR26914|c0_g1|CYP710A1  1.899   1.54    0.998   0.255   0.427
----Matrix file--------------

----Row annorarion file --------------
------1. At least two columns--------------
------2. The first column should be the same as the first column in
         matrix (order does not matter)--------------
Name    Clan    Family
TR19267|c0_g1|CYP703A2  CYP71   CYP703
TR19612|c1_g3|CYP707A1  CYP85   CYP707
TR60337|c4_g9|CYP707A1  CYP85   CYP707
TR19612|c0_g1|CYP707A3  CYP85   CYP707
TR35761|c0_g1|CYP707A4  CYP85   CYP707
TR58054|c0_g2|CYP707A4  CYP85   CYP707
TR14082|c7_g4|CYP707A4  CYP85   CYP707
TR60509|c0_g1|CYP707A7  CYP85   CYP707
TR26914|c0_g1|CYP710A1  CYP710  CYP710
----Row annorarion file --------------

----Column annorarion file --------------
------1. At least two columns--------------
------2. The first column should be the same as the first row in
---------matrix (order does not matter)--------------
Name    Sample
T0_1    T0
T0_2    T0
T0_3    T0
T4_1    T4
T4_2    T4
----Column annorarion file --------------

Usage:

sp_pheatmap.sh options

Function:

This script is used to do heatmap using package pheatmap.

The parameters for logical variable are either TRUE or FALSE.

OPTIONS:
    -f  Data file (with header line, the first column is the
        rowname, tab seperated. Colnames must be unique unless you
        know what you are doing.)[NECESSARY]
    -t  Title of picture[Default empty title]
        ["Heatmap of gene expression profile"]
    -a  Display xtics. [Default TRUE]
    -A  Rotation angle for x-axis value (anti clockwise)
        [Default 90]
    -b  Display ytics. [Default TRUE]
    -H  Hieratical cluster for columns.
        Default FALSE, accept TRUE
    -R  Hieratical cluster for rows.
        Default TRUE, accept FALSE
    -c  Clustering method, Default "complete". 
        Accept "ward.D", "ward.D2","single", "average" (=UPGMA), 
        "mcquitty" (=WPGMA), "median" (=WPGMC) or "centroid" (=UPGMC)
    -C  Color vector. 
        Default pheatmap_default. 
        Aceept a vector containing multiple colors such as 
        <'c("white", "blue")'> will be transferred 
        to <colorRampPalette(c("white", "blue"), bias=1)(30)>
        or an R function 
        <colorRampPalette(rev(brewer.pal(n=7, name="RdYlBu")))(100)>
        generating a list of colors.

    -T  Color type, a vetcor which will be transferred as described in <-C> [vector] or
        a raw vector [direct vector] or a function [function (default)].
    -B  A positive number. Default 1\. Values larger than 1 will give more color
        for high end. Values between 0-1 will give more color for low end.  
    -D  Clustering distance method for rows.
        Default 'correlation', accept 'euclidean', 
        "manhattan", "maximum", "canberra", "binary", "minkowski". 
    -I  Clustering distance method for cols.
        Default 'correlation', accept 'euclidean', 
        "manhattan", "maximum", "canberra", "binary", "minkowski". 
    -L  First get log-value, then do other analysis.
        Accept an R function log2 or log10\. 
        [Default FALSE]
    -d  Scale the data or not for clustering and visualization.
        [Default 'none' means no scale, accept 'row', 'column' to 
        scale by row or column.]
    -m  The maximum value you want to keep, any number larger willl
        be taken as this given maximum value.
        [Default Inf, Optional] 
    -s  The smallest value you want to keep, any number smaller will
        be taken as this given minimum value.
        [Default -Inf, Optional]  
    -k  Aggregate the rows using kmeans clustering. 
        This is advisable if number of rows is so big that R cannot 
        handle their hierarchical clustering anymore, roughly more than 1000.
        Instead of showing all the rows separately one can cluster the
        rows in advance and show only the cluster centers. The number
        of clusters can be tuned here.
        [Default 'NA' which means no
        cluster, other positive interger is accepted for executing
        kmeans cluster, also the parameter represents the number of
        expected clusters.]
    -P  A file to specify row-annotation with format described above.
        [Default NA]
    -Q  A file to specify col-annotation with format described above.
        [Default NA]
    -u  The width of output picture.[Default 20]
    -v  The height of output picture.[Default 20] 
    -E  The type of output figures.[Default pdf, accept
        eps/ps, tex (pictex), png, jpeg, tiff, bmp, svg and wmf)]
    -r  The resolution of output picture.[Default 300 ppi]
    -F  Font size [Default 14]
    -p  Preprocess data matrix to avoid 'STDERR 0 in cor(t(mat))'.
        Lowercase <p>.
        [Default TRUE]
    -e  Execute script (Default) or just output the script.
        [Default TRUE]
    -i  Install the required packages. Normmaly should be TRUE if this is 
        your first time run s-plot.[Default FALSE]

sp_pheatmap.sh是我寫作的繪圖工具s-plot的一個功能,s-plot可以繪製的圖的型別還有一些,列舉如下;在後面的教程中,會一一提起。

Usage:

s-plot options

Function:

This software is designed to simply the process of plotting and help
researchers focus more on data rather than technology.

Currently, the following types of plot are supported.

#### Bars
s-plot barPlot
s-plot horizontalBar
s-plot multiBar
s-plot colorBar

#### Lines
s-plot lines

#### Dots
s-plot pca
s-plot scatterplot
s-plot scatterplot3d
s-plot scatterplot2
s-plot scatterplotColor
s-plot scatterplotContour
s-plot scatterplotLotsData
s-plot scatterplotMatrix
s-plot scatterplotDoubleVariable
s-plot contourPlot
s-plot density2d

#### Distribution
s-plot areaplot
s-plot boxplot
s-plot densityPlot
s-plot densityHistPlot
s-plot histogram

#### Cluster
s-plot hcluster_gg (latest)
s-plot hcluster
s-plot hclust (depleted)

#### Heatmap
s-plot heatmapS
s-plot heatmapM
s-plot heatmap.2
s-plot pheatmap
s-plot pretteyHeatmap # obseleted
s-plot prettyHeatmap

#### Others
s-plot volcano
s-plot vennDiagram
s-plot upsetView

為了推廣,也為了激起大家的熱情,如果想要sp_pheatmap.sh指令碼的,還需要勞煩大家動動手,轉發此文章到朋友圈,並留言索取。

生信寶典,一起換個角度學生信

<footer class="entry-meta" style="box-sizing: border-box; display: block; font-size: 0.75rem; text-transform: uppercase; color: rgba(187, 187, 187, 0.8); margin: 50px 30px 30px; text-align: center; font-family: Lato, Calibri, Arial, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">RBIOINFOCHENTONG
版權宣告:本文為博主原創文章,轉載請註明出處。

7071112-b1fd5b8e289b6bf4.png
alipay.png
7071112-611596950d8208b2.png
WeChatPay.png

</footer>

相關文章