R統計繪圖 - 熱圖簡化
歡迎關注天下部落格:http://blog.genesino.com/2017/06/heatmap-simple/
Jump to...
熱圖繪製 - pheatmap
繪製熱圖除了使用ggplot2
,還可以有其它的包或函式,比如pheatmap::pheatmap
(pheatmap包中的pheatmap函式)、gplots::heatmap.2
等。
相比於ggplot2
作heatmap, pheatmap
會更為簡單一些,一個函式設定不同的引數,可以完成行列聚類、行列註釋、Z-score計算、顏色自定義等。那我們來看看效果怎樣。
data_ori <- "Grp_1;Grp_2;Grp_3;Grp_4;Grp_5
a;6.6;20.9;100.1;600.0;5.2
b;20.8;99.8;700.0;3.7;19.2
c;100.0;800.0;6.2;21.4;98.6
d;900;3.3;20.3;101.1;10000"
data <- read.table(text=data_ori, header=T, row.names=1, sep=";", quote="")
Grp_1 Grp_2 Grp_3 Grp_4 Grp_5
a 6.6 20.9 100.1 600.0 5.2
b 20.8 99.8 700.0 3.7 19.2
c 100.0 800.0 6.2 21.4 98.6
d 900.0 3.3 20.3 101.1 10000.0
pheatmap::pheatmap(data, filename="pheatmap_1.pdf")
雖然有點醜,但一步就出來了。
在heatmap美化
篇提到的資料前期處理方式,都可以用於pheatmap
的畫圖。此外Z-score
計算在pheatmap
中只要一個引數就可以實現。
pheatmap::pheatmap(data, scale="row", filename="pheatmap_1.pdf")
有時可能不需要行或列的聚類,原始展示就可以了。
pheatmap::pheatmap(data, scale="row", cluster_rows=FALSE, cluster_cols=FALSE, filename="pheatmap_1.pdf")
給矩陣 (data
)中行和列不同的分組註釋。假如有兩個檔案,第一個檔案為行註釋,其第一列與矩陣中的第一列內容相同 (順序沒有關係),其它列為第一列的不同的標記,如下面示例中(假設行為基因,列為樣品)的2,3列對應基因的不同型別 (TF or enzyme)和不同分組。第二個檔案為列註釋,其第一列與矩陣中第一行內容相同,其它列則為樣品的註釋。
row_anno = data.frame(type=c("TF","Enzyme","Enzyme","TF"), class=c("clu1","clu1","clu2","clu2"), row.names=rownames(data))
row_anno
type class
a TF clu1
b Enzyme clu1
c Enzyme clu2
d TF clu2
col_anno = data.frame(grp=c("A","A","A","B","B"), size=1:5, row.names=colnames(data))
col_anno
grp size
Grp_1 A 1
Grp_2 A 2
Grp_3 A 3
Grp_4 B 4
Grp_5 B 5
pheatmap::pheatmap(data, scale="row",
cluster_rows=FALSE,
annotation_col=col_anno,
annotation_row=row_anno,
filename="pheatmap_1.pdf")
自定義下顏色吧。
# <bias> values larger than 1 will give more color for high end.
# Values between 0-1 will give more color for low end.
pheatmap::pheatmap(data, scale="row",
cluster_rows=FALSE,
annotation_col=col_anno,
annotation_row=row_anno,
color=colorRampPalette(c('green','yellow','red'), bias=1)(50),
filename="pheatmap_1.pdf")
heatmap.2
的使用就不介紹了,跟pheatmap
有些類似,而且也有不少教程。
不改指令碼的熱圖繪製
繪圖時通常會碰到兩個頭疼的問題:
需要畫很多的圖,唯一的不同就是輸出檔案,其它都不需要修改。如果用R指令碼,需要反覆替換檔名,繁瑣又容易出錯。
每次繪圖都需要不斷的調整引數,時間久了不用,就忘記引數放哪了;或者調整次數過多,有了很多版本,最後不知道用哪個了。
為了簡化繪圖、維持指令碼的一致,我用bash
對R
做了一個封裝,然後就可以通過修改命令好引數繪製不同的圖了。
先看一看怎麼使用
首先把測試資料儲存到檔案中方便呼叫。資料矩陣儲存在heatmap_data.xls
檔案中;行註釋儲存在heatmap_row_anno.xls
檔案中;列註釋儲存在heatmap_col_anno.xls
檔案中。
# tab鍵分割,每列不加引號
write.table(data, file="heatmap_data.xls", sep="\t", row.names=T, col.names=T,quote=F)
# 如果看著第一行少了ID列不爽,可以填補下
system("sed -i '1 s/^/ID\t/' heatmap_data.xls")
write.table(row_anno, file="heatmap_row_anno.xls", sep="\t", row.names=T, col.names=T,quote=F)
write.table(col_anno, file="heatmap_col_anno.xls", sep="\t", row.names=T, col.names=T,quote=F)
然後用程式sp_pheatmap.sh
繪圖。
# -f: 指定輸入的矩陣檔案
# -d:指定是否計算Z-score,<none> (否), <row> (按行算), <col> (按列算)
# -P: 行註釋檔案
# -Q: 列註釋檔案
ct@ehbio:~/$ sp_pheatmap.sh -f heatmap_data.xls -d row -P heatmap_row_anno.xls -Q heatmap_col_anno.xls
一個回車就得到了下面的圖
字有點小,是因為圖太大了,把圖的寬和高縮小下試試。
# -f: 指定輸入的矩陣檔案
# -d:指定是否計算Z-score,<none> (否), <row> (按行算), <col> (按列算)
# -P: 行註釋檔案
# -Q: 列註釋檔案
# -u: 設定寬度,單位是inch
# -v: 設定高度,單位是inch
ct@ehbio:~/$ sp_pheatmap.sh -f heatmap_data.xls -d row -P heatmap_row_anno.xls -Q heatmap_col_anno.xls -u 8 -v 12
橫軸的標記水平放置
# -A: 0, X軸標籤選擇0度
# -C: 自定義顏色,注意引號的使用,最外層引號與內層引號不同,引號之間無交叉
# -T: 指定給定的顏色的型別;如果給的是vector (如下面的例子), 則-T需要指定為vector; 否則結果會很怪異,只有倆顏色。
# -t: 指定圖形的題目,注意引號的使用;引數中包含空格或特殊字元等都要用引號引起來作為一個整體。
ct@ehbio:~/$ sp_pheatmap.sh -f heatmap_data.xls -d row -P heatmap_row_anno.xls -Q heatmap_col_anno.xls -u 8 -v 12 -A 0 -C 'c("white", "blue")' -T vector -t "Heatmap of gene expression profile"
sp_pheatmap.sh
的引數還有一些,可以完成前面講述過的所有熱圖的繪製,具體如下:
***CREATED BY Chen Tong (chentong_biology@163.com)***
----Matrix file--------------
Name T0_1 T0_2 T0_3 T4_1 T4_2
TR19267|c0_g1|CYP703A2 1.431 0.77 1.309 1.247 0.485
TR19612|c1_g3|CYP707A1 0.72 0.161 0.301 2.457 2.794
TR60337|c4_g9|CYP707A1 0.056 0.09 0.038 7.643 15.379
TR19612|c0_g1|CYP707A3 2.011 0.689 1.29 0 0
TR35761|c0_g1|CYP707A4 1.946 1.575 1.892 1.019 0.999
TR58054|c0_g2|CYP707A4 12.338 10.016 9.387 0.782 0.563
TR14082|c7_g4|CYP707A4 10.505 8.709 7.212 4.395 6.103
TR60509|c0_g1|CYP707A7 3.527 3.348 2.128 3.257 2.338
TR26914|c0_g1|CYP710A1 1.899 1.54 0.998 0.255 0.427
----Matrix file--------------
----Row annorarion file --------------
------1. At least two columns--------------
------2. The first column should be the same as the first column in
matrix (order does not matter)--------------
Name Clan Family
TR19267|c0_g1|CYP703A2 CYP71 CYP703
TR19612|c1_g3|CYP707A1 CYP85 CYP707
TR60337|c4_g9|CYP707A1 CYP85 CYP707
TR19612|c0_g1|CYP707A3 CYP85 CYP707
TR35761|c0_g1|CYP707A4 CYP85 CYP707
TR58054|c0_g2|CYP707A4 CYP85 CYP707
TR14082|c7_g4|CYP707A4 CYP85 CYP707
TR60509|c0_g1|CYP707A7 CYP85 CYP707
TR26914|c0_g1|CYP710A1 CYP710 CYP710
----Row annorarion file --------------
----Column annorarion file --------------
------1. At least two columns--------------
------2. The first column should be the same as the first row in
---------matrix (order does not matter)--------------
Name Sample
T0_1 T0
T0_2 T0
T0_3 T0
T4_1 T4
T4_2 T4
----Column annorarion file --------------
Usage:
sp_pheatmap.sh options
Function:
This script is used to do heatmap using package pheatmap.
The parameters for logical variable are either TRUE or FALSE.
OPTIONS:
-f Data file (with header line, the first column is the
rowname, tab seperated. Colnames must be unique unless you
know what you are doing.)[NECESSARY]
-t Title of picture[Default empty title]
["Heatmap of gene expression profile"]
-a Display xtics. [Default TRUE]
-A Rotation angle for x-axis value (anti clockwise)
[Default 90]
-b Display ytics. [Default TRUE]
-H Hieratical cluster for columns.
Default FALSE, accept TRUE
-R Hieratical cluster for rows.
Default TRUE, accept FALSE
-c Clustering method, Default "complete".
Accept "ward.D", "ward.D2","single", "average" (=UPGMA),
"mcquitty" (=WPGMA), "median" (=WPGMC) or "centroid" (=UPGMC)
-C Color vector.
Default pheatmap_default.
Aceept a vector containing multiple colors such as
<'c("white", "blue")'> will be transferred
to <colorRampPalette(c("white", "blue"), bias=1)(30)>
or an R function
<colorRampPalette(rev(brewer.pal(n=7, name="RdYlBu")))(100)>
generating a list of colors.
-T Color type, a vetcor which will be transferred as described in <-C> [vector] or
a raw vector [direct vector] or a function [function (default)].
-B A positive number. Default 1\. Values larger than 1 will give more color
for high end. Values between 0-1 will give more color for low end.
-D Clustering distance method for rows.
Default 'correlation', accept 'euclidean',
"manhattan", "maximum", "canberra", "binary", "minkowski".
-I Clustering distance method for cols.
Default 'correlation', accept 'euclidean',
"manhattan", "maximum", "canberra", "binary", "minkowski".
-L First get log-value, then do other analysis.
Accept an R function log2 or log10\.
[Default FALSE]
-d Scale the data or not for clustering and visualization.
[Default 'none' means no scale, accept 'row', 'column' to
scale by row or column.]
-m The maximum value you want to keep, any number larger willl
be taken as this given maximum value.
[Default Inf, Optional]
-s The smallest value you want to keep, any number smaller will
be taken as this given minimum value.
[Default -Inf, Optional]
-k Aggregate the rows using kmeans clustering.
This is advisable if number of rows is so big that R cannot
handle their hierarchical clustering anymore, roughly more than 1000.
Instead of showing all the rows separately one can cluster the
rows in advance and show only the cluster centers. The number
of clusters can be tuned here.
[Default 'NA' which means no
cluster, other positive interger is accepted for executing
kmeans cluster, also the parameter represents the number of
expected clusters.]
-P A file to specify row-annotation with format described above.
[Default NA]
-Q A file to specify col-annotation with format described above.
[Default NA]
-u The width of output picture.[Default 20]
-v The height of output picture.[Default 20]
-E The type of output figures.[Default pdf, accept
eps/ps, tex (pictex), png, jpeg, tiff, bmp, svg and wmf)]
-r The resolution of output picture.[Default 300 ppi]
-F Font size [Default 14]
-p Preprocess data matrix to avoid 'STDERR 0 in cor(t(mat))'.
Lowercase <p>.
[Default TRUE]
-e Execute script (Default) or just output the script.
[Default TRUE]
-i Install the required packages. Normmaly should be TRUE if this is
your first time run s-plot.[Default FALSE]
sp_pheatmap.sh
是我寫作的繪圖工具s-plot
的一個功能,s-plot
可以繪製的圖的型別還有一些,列舉如下;在後面的教程中,會一一提起。
Usage:
s-plot options
Function:
This software is designed to simply the process of plotting and help
researchers focus more on data rather than technology.
Currently, the following types of plot are supported.
#### Bars
s-plot barPlot
s-plot horizontalBar
s-plot multiBar
s-plot colorBar
#### Lines
s-plot lines
#### Dots
s-plot pca
s-plot scatterplot
s-plot scatterplot3d
s-plot scatterplot2
s-plot scatterplotColor
s-plot scatterplotContour
s-plot scatterplotLotsData
s-plot scatterplotMatrix
s-plot scatterplotDoubleVariable
s-plot contourPlot
s-plot density2d
#### Distribution
s-plot areaplot
s-plot boxplot
s-plot densityPlot
s-plot densityHistPlot
s-plot histogram
#### Cluster
s-plot hcluster_gg (latest)
s-plot hcluster
s-plot hclust (depleted)
#### Heatmap
s-plot heatmapS
s-plot heatmapM
s-plot heatmap.2
s-plot pheatmap
s-plot pretteyHeatmap # obseleted
s-plot prettyHeatmap
#### Others
s-plot volcano
s-plot vennDiagram
s-plot upsetView
為了推廣,也為了激起大家的熱情,如果想要sp_pheatmap.sh
指令碼的,還需要勞煩大家動動手,轉發此文章到朋友圈,並留言索取。
生信寶典,一起換個角度學生信
<footer class="entry-meta" style="box-sizing: border-box; display: block; font-size: 0.75rem; text-transform: uppercase; color: rgba(187, 187, 187, 0.8); margin: 50px 30px 30px; text-align: center; font-family: Lato, Calibri, Arial, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">RBIOINFOCHENTONG
版權宣告:本文為博主原創文章,轉載請註明出處。
</footer>
相關文章
- R語言之視覺化①②熱圖繪製2R語言視覺化
- R繪圖(2): 離散/分類變數如何畫熱圖/方塊圖繪圖變數
- R語言繪圖R語言繪圖
- 【 視覺化】熱力圖繪製原理視覺化
- R繪圖(06)——帶errorbar的柱狀圖繪圖ErrorORB
- R繪圖 第一篇:ggplot2繪圖繪圖
- R語言中繪圖設定不輸出繪圖內容R語言繪圖
- R繪圖(3): 散點圖新增文字註釋繪圖
- 簡易流程圖繪圖軟體流程圖繪圖
- 化學繪圖軟體ChemDraw,繪製簡單的化學元素繪圖
- CentOS6.5下實現R繪圖CentOS繪圖
- R繪圖(7): 把散點圖的點換成扇形繪圖
- MATLAB簡單繪圖Matlab繪圖
- 統計分析繪圖---GraphPad Prism 10繪圖PHP
- R語言中繪圖 設定圖例中隱藏圖例的框線R語言繪圖
- 【R語言】繪製權重直方圖R語言直方圖
- 如何使用R語言在SAP Analytics Cloud裡繪製各種統計圖表R語言Cloud
- 繪圖: matplotlib Basemap簡介繪圖
- 在UnityUI中繪製線狀統計圖UnityUI
- 尤拉計劃713:圖蘭熱水系統
- Matplotlib呼叫imshow()函式繪製熱圖函式
- 基於chart.js繪製熱力圖JS
- R語言中ggplot繪圖繪製L型圖形,並設定框線的粗細R語言繪圖
- 繪製三元圖、顏色空間圖:R語言程式碼R語言
- 繪圖: Python matplotlib簡介繪圖Python
- flutter 自定義view 繪製曲線統計圖FlutterView
- Qt 繪圖與動畫系統QT繪圖動畫
- 燈光系統圖繪製
- 社交網路分析的 R 基礎:(六)繪圖操作繪圖
- Python繪圖與視覺化Python繪圖視覺化
- GraphPad Prism 9 for Mac 統計分析繪圖軟體PHPMac繪圖
- 繪製流程圖的簡單軟體流程圖
- 如何自學qt(12)——簡單的繪圖QT繪圖
- iOS繪圖iOS繪圖
- 繪圖工具繪圖
- PLT繪圖繪圖
- 統計分析柱狀圖繪製工具GraphPad Prism 9PHP
- 【Android繪圖】繪圖之基礎篇(一)Android繪圖