《機器學習實戰》2.2.2分析資料:使用matplotlib建立散點圖

王明輝發表於2018-01-27

 

#輸出散點圖
def f():
    datingDataMat,datingLabels = file2matrix("datingTestSet3.txt")

    fig = plt.figure()
    # ax = fig.add_subplot(199,projection='polar')
    # ax = fig.add_subplot(111,projection='hammer')
    # ax = fig.add_subplot(111,projection='lambert')
    # ax = fig.add_subplot(111,projection='mollweide')
    # ax = fig.add_subplot(111,projection='aitoff')
    # ax = fig.add_subplot(111,projection='rectilinear')
    # ax = fig.add_subplot(111,projection='rectilinear')

    #此處的add_subplot引數的意思是把畫布分為3行4列,畫在從左到右從上到下的第2個格里
    ax = fig.add_subplot(3,4,2) #fig.add_subplot(342)也可以,但是這樣無法表示兩位數
ax.scatter(datingDataMat[:,
1],datingDataMat[:,2]) # ax1 = fig.add_subplot(221) # ax1.plot(datingDataMat[:,1],datingDataMat[:,2]) plt.show()
其中fig.add_subplot(3,4,2)的效果圖如下(紅框是我加的,原輸出沒有):

所以fig.add_subplot(3,4,12)的效果就是:
 

所以,第三個引數不能超過前兩個的乘積,如果用fig.add_subplot(a,b,c)來表示的話,ab>=c,否則會報錯。

對於fig.add_subplot(3,4,12)這個函式,官方網站的解釋似乎有點問題,連結https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html?highlight=add_subplot#matplotlib.figure.Figure.add_subplot

查詢add_subplot(*args**kwargs),得到如下解釋:

*args

Either a 3-digit integer or three separate integers describing the position of the subplot. If the three integers are I, J, and K, the subplot is the Ith plot on a grid with J rows and K columns.

意思是,三個引數分別為I, J, K,表示J行K列,那I是什麼?沒有提及。

倒是下面的See also所指向的matplotlib.pyplot.subplot給出了正確的解釋。

matplotlib.pyplot.subplot

subplot(nrows, ncols, index, **kwargs)
 In the current figure, create and return an Axes, at position index of a (virtual) grid of nrows by ncols axes. Indexes go from 1 to nrows *ncols, incrementing in row-major order.

If nrowsncols and index are all less than 10, they can also be given as a single, concatenated, three-digit number.

For example, subplot(2, 3, 3) and subplot(233) both create an Axes at the top right corner of the current figure, occupying half of the figure height and a third of the figure width.

 

由於沒有使用樣本分類的特徵值,我們很難看出來任何有價值的資訊。Matplotlib庫提供的scatter函式支援個性化標記散點圖上的點。

#輸出進行了分類的散點圖
def g():
    datingDataMat,datingLabels = file2matrix("datingTestSet2.txt")
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.set_title("scatter")
    #ax.scatter(datingDataMat[:,1],datingDataMat[:,2])
    #ax.scatter(datingDataMat[:,0],datingDataMat[:,1],15.0*array(datingLabels),15.0*array(datingLabels))
    print(datingLabels)
    ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15.0 * array(datingLabels),15.0 * array(datingLabels))
  #上式的後兩個引數
15.0 * array(datingLabels)和15.0 * array(datingLabels),實際上是s和c兩個引數,用於設定大小和顏色,可以不同,具體如下:
    #ax.scatter(datingDataMat[:,0],datingDataMat[:,1],s=15.0*array(datingLabels),c=15.0*array(datingLabels))
  #其中的15只是為了擴大倍數,使差別更明顯,只要你願意,你可以用1000,100000等等任何數字去乘。
    plt.show()
這裡著重說明一下scatter函式
Axes.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, *, data=None, **kwargs)
x,y表示點的位置
s表示點的大小,官方說明:
scalar or array_like, shape (n, ), optional,數值或類陣列
size in points^2. Default is rcParams['lines.markersize'] ** 2
語焉不詳,沒太看懂,看到了size,以下是逐步測試出來的結果,從效果來看,s可能是scale的縮寫
為了便於測試,我在datingTestSet2.txt中只保留了前5個樣本

40920 8.326976 0.953952 3
14488 7.153469 1.673904 2
26052 1.441871 0.805124 1
75136 13.147394 0.428964 1
38344 1.669788 0.134296 1

 

ax.scatter(datingDataMat[:,1],datingDataMat[:,2],s=1)執行效果如下



ax.scatter(datingDataMat[:,1],datingDataMat[:,2],s=100)
為了變化更明顯,把s值擴大了100倍,執行效果如下:

 

作為單一數值的效果我們看到了,官方說明中,還有一個array_like的形式,我們來測試一下

ax.scatter(datingDataMat[:,1],datingDataMat[:,2],s=[1]),這個就不貼圖了,和數值1是一樣的,所有點的大小是一樣的。

ax.scatter(datingDataMat[:,1],datingDataMat[:,2],s=[1,50]),看看這是什麼效果:

有些變,有些不變,規律是什麼?經過一番測試,中間過程不說了,函式會根據樣本的位置與s中對應位置元素的值進行設定,舉個栗子,

第1個樣本的值是x=8.326976, y=0.953952,s中對應的第1個值是1,所以這個點的大小是1

第2個樣本的值是x=7.153469, y=1.673904,s中對應的第2個值是50,所以這個點的大小是50

第3個樣本的值是x=1.441871, y=0.805124,s中只有兩個值,所以現在回到第1個值,是1,所以這個點的大小是50

以下同理,迴圈。

s=[1,50,500]時,同理。

 

 引數c

ax.scatter(datingDataMat[:,1],datingDataMat[:,2],s=[1,50], c='r')

引數c表示點的顏色

c : color, sequence, or sequence of color, optional, default: ‘b’

c can be a single color format string, or a sequence of color specifications of length N, or a sequence of Nnumbers to be  mapped to colors using the cmap and norm specified via kwargs (see below). Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however, including the case of a single row to specify the same color for all points.

 Matplotlib recognizes the following formats to specify a color:

  • an RGB or RGBA tuple of float values in [0, 1] (e.g., (0.1, 0.2, 0.5) or (0.1, 0.2, 0.5, 0.3));
  • a hex RGB or RGBA string (e.g., '#0F0F0F' or '#0F0F0F0F');
  • a string representation of a float value in [0, 1] inclusive for gray level (e.g., '0.5');
  • one of {'b', 'g', 'r', 'c', 'm', 'y', 'k', 'w'};
  • a X11/CSS4 color name;
  • a name from the xkcd color survey; prefixed with 'xkcd:' (e.g., 'xkcd:sky blue');
  • one of {'tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 'tab:brown', 'tab:pink', 'tab:gray', 'tab:olive','tab:cyan'} which are the Tableau Colors from the ‘T10’ categorical palette (which is the default color cycle);
  • a “CN” color spec, i.e. 'C' followed by a single digit, which is an index into the default property cycle (matplotlib.rcParams['axes.prop_cycle']); the indexing occurs at artist creation time and defaults to black if the cycle does not include color.

All string specifications of color, other than “CN”, are case-insensitive.

c='r'表示所有點的顏色都變為紅色

如果要設定不同的顏色,要用陣列或元組,如下:
ax.scatter(datingDataMat[:,1],datingDataMat[:,2],s=[1,50], c=('r','b'))
設定規律同引數s,1、2、3迴圈

引數marker

marker : MarkerStyle, optional, default: ‘o’

表示圖上的點的樣式,預設是'o',也就是我們最常見的圓點,沒看出來"."和"o"有什麼區別。

All possible markers are defined here:

以下是所有可能的樣式,各位有興趣可以試一下,挺好玩的。 其中從TICKLEFT開始的幾個英文單詞,不知道怎麼用。

markerdescription
"." point
"," pixel
"o" circle
"v" triangle_down
"^" triangle_up
"<" triangle_left
">" triangle_right
"1" tri_down
"2" tri_up
"3" tri_left
"4" tri_right
"8" octagon
"s" square
"p" pentagon
"P" plus (filled)
"*" star
"h" hexagon1
"H" hexagon2
"+" plus
"x" x
"X" x (filled)
"D" diamond
"d" thin_diamond
"|" vline
"_" hline
TICKLEFT tickleft
TICKRIGHT tickright
TICKUP tickup
TICKDOWN tickdown
CARETLEFT caretleft (centered at tip)
CARETRIGHT caretright (centered at tip)
CARETUP caretup (centered at tip)
CARETDOWN caretdown (centered at tip)
CARETLEFTBASE caretleft (centered at base)
CARETRIGHTBASE caretright (centered at base)
CARETUPBASE caretup (centered at base)
"None"" or "" nothing
'$...$' render the string using mathtext.
verts a list of (x, y) pairs used for Path vertices. The center of the marker is located at (0,0) and the size is normalized.
path Path instance.
(numsidesstyleangle)

The marker can also be a tuple (numsidesstyleangle), which will create a custom, regular symbol.

numsides:
the number of sides
style:

the style of the regular symbol:

0
a regular polygon
1
a star-like symbol
2
an asterisk
3
a circle (numsides and angle is ignored)
angle:
the angle of rotation of the symbol

For backward compatibility, the form (verts, 0) is also accepted, but it is equivalent to just verts for giving a raw set of vertices that define the shape.

 

其它的引數暫時不去分析,以後用到時再說。

 

相關文章