Matplotlib直方圖繪製技巧

orion發表於2022-02-03

情境引入

我們在做機器學習相關專案時，常常會分析資料集的樣本分佈，而這就需要用到直方圖的繪製。

在Python中可以很容易地呼叫matplotlib.pyplot的hist函式來繪製直方圖。不過，該函式引數不少，有幾個繪圖的小細節也需要注意。

首先，我們假定現在有個聯邦學習的專案情景。我們有一個樣本個數為15的圖片資料集，樣本標籤有4個，分別為cat, dog, car, ship。這個資料集已經被不均衡地劃分到4個任務節點(client)上，如像下面表示：

N_CLIENTS = 3 
num_cls, classes = 4, ['cat', 'dog', 'car', 'ship']
train_labels = [0, 3, 2, 0, 3, 2, 1, 0, 3, 3, 1, 0, 3, 2, 2] #資料集的標籤列表

client_idcs = [slice(0, 4), slice(4, 11), slice(11, 15)]
# 資料集樣本在client上的劃分情況

我們需要視覺化樣本在任務節點的分佈情況。我們第一次可能會寫出如下程式碼：

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(5,3))
plt.hist([train_labels[idc]for idc in client_idcs], stacked=False, 
         bins=num_cls,
        label=["Client {}".format(i) for i in range(N_CLIENTS)])

plt.xticks(np.arange(num_cls), classes)
plt.legend()
plt.show()

此時的視覺化結果如下：

這時我們會發現，我們x軸上的標籤和上方的bar（每個影像類別對應的3個bar合稱為1個bin）並沒有對齊，而這時劇需要我們調整bins這個引數。

bins 引數

在講述bins引數之前我們先來熟悉一下hist繪圖中bin和bar的含義。下面是它們的詮釋圖：

這裡\(x_1\)、\(x_2\)是x軸物件，在hist中，預設x軸第一個物件對應刻度為0，第2個物件刻度為1，依次類圖。在這個詮釋圖上，bin（原意為垃圾箱）就是指每個x軸物件所佔優的矩形繪圖區域，bar(原意為塊)就是指每個矩形繪圖區域中的條形。如上圖所示，x軸第一個物件對應的bin區間為[-0.5, 0.5)，第2個物件對應的bin區域為[0.5, 1)(注意，hist規定一定是左閉又開)。每個物件的bin區域內都有3個bar。

通過查閱matplotlib文件，我們知道了bins引數的解釋如下：

bins: int or sequence or str, default: rcParams["hist.bins"] (default: 10)

If bins is an integer, it defines the number of equal-width bins in the range.

If bins is a sequence, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin; in this case, bins may be unequally spaced. All but the last (righthand-most) bin is half-open. In other words, if bins is:

[1, 2, 3, 4]

then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.

If bins is a string, it is one of the binning strategies supported by numpy.histogram_bin_edges: 'auto', 'fd', 'doane', 'scott', 'stone', 'rice', 'sturges', or 'sqrt'.

我來概括一下，也就是說如果bins是個數字，那麼它設定的是bin的個數，也就是沿著x軸劃分多少個獨立的繪圖區域。我們這裡有四個影像類別，故需要設定4個繪圖區域，每個區域相對於x軸刻度的偏移採取預設設定。

不過，如果我們要設定每個區域的位置偏移，我們就需要將bins設定為一個序列。

bins序列的刻度要參照hist函式中的x座標刻度來設定，本任務中4個分類類別對應的x軸刻度分別為[0, 1, 2, 3] 。如果我們將序列設定為[0, 1, 2, 3, 4]就表示第一個繪圖區域對應的區間是[1, 2)，第2個繪圖區域對應的位置是[1, 2),第三個繪圖區域對應的位置是[2, 3)，依次類推。

就大眾審美而言，我們想讓每個區域的中心和對應x軸刻度對齊，這第一個區域的區間為[-0.5, 0.5)，第二個區域的區間為[0.5, 1.5)，依次類推。則最終的bins序列為[-0.5, 0.5, 1.5, 2.5, 3.5]。於是，我們將hist函式修改如下：

plt.hist([train_labels[idc]for idc in client_idcs], stacked=False, 
         bins=np.arange(-0.5, 4, 1),
        label=["Client {}".format(i) for i in range(N_CLIENTS)])

這樣，每個劃分割槽域和對應x軸的刻度就對齊了：

stacked引數

有時x軸的專案多了，每個x軸的物件都要設定3個bar對繪圖空間無疑是一個巨大的佔用。在這個情況下我們如何壓縮空間的使用呢？這個時候引數stacked就派上了用場，我們將引數stacked設定為True:

plt.hist([train_labels[idc]for idc in client_idcs],stacked=True 
         bins=np.arange(-0.5, 4, 1),
        label=["Client {}".format(i) for i in range(N_CLIENTS)])

可以看到每個x軸物件的bar都“疊加”起來了：

不過，新的問題又出來了，這樣每x軸物件的bar之間完全沒有距離了，顯得十分“擁擠”，我們可否修改bins引數以設定區域bin之間的間距呢？答案是不行，因為我們前面提到過，bins引數中只能將區域設定為連續排布的。

換一個思路，我們設定每個bin內的bar和bin邊界之間的間距。此時，我們需要修改r_width引數。

rwidth 引數

我們看文件中對rwidth引數的解釋：

rwidth float or None, default: None

The relative width of the bars as a fraction of the bin width. If None, automatically compute the width.

Ignored if histtype is 'step' or 'stepfilled'.

翻譯一下，rwidth用於設定每個bin中的bar相對bin的大小。這裡我們不妨修改為0.5：

plt.hist([train_labels[idc]for idc in client_idcs],stacked=True, 
         bins=np.arange(-0.5, 4, 1), rwidth=0.5, 
        label=["Client {}".format(i) for i in range(N_CLIENTS)])

修改之後的圖表如下：

可以看到每個x軸元素內的bar正好佔對應bin的寬度的二分之一。

引用

[1] https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

matplotlib的直方圖繪製（筆記）
2020-12-28
直方圖筆記
[Python] Matplotlib 圖表的繪製和美化技巧
2021-02-20
Python
matplotlib繪製圖形
2020-10-15
[Python影象處理] 十一.灰度直方圖概念及OpenCV繪製直方圖
2018-11-06
Python直方圖OpenCV
【R語言】繪製權重直方圖
2020-12-20
R語言直方圖
Matplotlib 繪製折線圖
2023-01-15
Python繪製直方圖 Pygal模擬擲骰子
2018-04-09
Python直方圖
Python matplotlib繪製散點圖
2020-11-03
Python
Python Matplotlib繪製氣溫圖表
2018-04-10
Python
Python批次繪製遙感影像資料的直方圖
2023-02-27
Python直方圖
使用Matplotlib繪製3D圖形
2018-12-03
3D
Matplotlib呼叫imshow()函式繪製熱圖
2018-11-26
函式
使用python matplotlib實現動圖繪製
2018-06-13
Python
Python 利用pandas和matplotlib繪製餅圖
2023-11-03
Python
利用 Matplotlib 繪製資料圖形（一）
2019-05-08
利用 Matplotlib 繪製資料圖形（二）
2019-05-14
Python 利用pandas 和 matplotlib繪製柱狀圖
2023-10-28
Python
【Python_Demo_5】Python中條形重疊直方圖的繪製
2018-05-16
Python直方圖
[1]Python 中用 matplotlib 繪製熱點圖(heat map)
2019-02-13
Python
Python Matplotlib繪製條形圖的全過程
2021-10-24
Python
python繪圖之matplotlib
2019-01-05
Python繪圖
Matplotlib 詳細繪圖
2020-03-15
繪圖
Matplotlib繪圖基礎
2022-07-01
繪圖
小提琴圖的繪製方法：Python matplotlib實現
2023-10-16
Python
Python 利用pandas和matplotlib繪製柱狀折線圖
2023-11-09
Python
Matplotlib.pyplot.plot 繪圖
2023-05-18
繪圖
繪圖: Python matplotlib簡介
2020-02-05
繪圖Python
繪圖: matplotlib Basemap簡介
2020-02-05
繪圖
Matplotlib基礎繪圖功能
2020-10-23
繪圖
python資料視覺化-matplotlib入門(4)-條形圖和直方圖
2022-04-27
Python視覺化直方圖
Python-matplotlib-入門教程（一）-基礎圖表繪製
2018-12-11
Python
matplotlib繪製伯努利分佈的概率密度圖
2020-09-24
Affinity Designer繪製圖示的技巧分享
2021-11-11
canvas繪製直線
2018-07-02
Canvas
SVG 繪製直線
2018-08-04
SVG
Python批次讀取HDF多波段柵格資料並繪製像元直方圖
2023-03-01
Python直方圖
Matplotlib 系列之【繪製函式影象】
2018-08-02
函式
Matplotlib 系列之【繪製函式影像】
2019-03-04
函式

Matplotlib直方圖繪製技巧

情境引入

bins 引數

stacked引數

rwidth 引數

引用

相關文章