Machine Learning（16） - 關於 K Means Clustering 的練習題

Rachel發表於2019-06-15

原文網址 : https://learnku.com/articles/29873

題目

取 Iris 花 petal 花瓣的 width 和 height 作為資料集，用 Unsupervised 的方法將其做分類，然後與其 target 的值做對比，以檢驗分類的結果是否正確。

解題

引入需要的包和資料集

import pandas as pd
from sklearn import datasets

// 引入 iris 資料
iris = datasets.load_iris()

// 檢視 Iris 的屬性
dir(iris)
// 輸出
['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']

把 iris 資料轉為 dataframe

df = pd.DataFrame(iris.data, columns = iris.feature_names)
df.head()

輸出：

Machine Learning（16） - 關於 K Means Clustering 的練習題

去掉不需要的欄位

df.drop(['sepal length (cm)', 'sepal width (cm)'], axis='columns', inplace=True)
df.head()

輸出：

Machine Learning（16） - 關於 K Means Clustering 的練習題

以圖表的形式輸出目標資料集

from matplotlib import pyplot as plt
plt.scatter(df['petal length (cm)'], df['petal width (cm)'])

輸出：

Machine Learning（16） - 關於 K Means Clustering 的練習題

求出最佳 K 值

from sklearn.cluster import KMeans
k_rng = range(1, 10)
sse = []
for k in k_rng:
    km = KMeans(n_clusters=k)
    km.fit_predict(df[['petal width (cm)', 'petal length (cm)']])
    sse.append(km.inertia_)

sse

// 輸出
[550.8953333333334,
 86.39021984551397,
 31.371358974358973,
 19.48300089968511,
 13.916908757908757,
 11.03633387775173,
 9.191170634920635,
 7.672362403043182,
 6.456494541406307]

plt.xlabel('K')
plt.ylabel('SEE')
plt.plot(k_rng, sse)

輸出：

Machine Learning（16） - 關於 K Means Clustering 的練習題

通過上面的分析，將資料集分成 3 份

km = KMeans(n_clusters=3)
km
// 輸出
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

cluster = km.fit_predict(df[['petal width (cm)', 'petal length (cm)']])
df['cluster'] = cluster
df.head()

輸出：

Machine Learning（16） - 關於 K Means Clustering 的練習題

df1 = df[df.cluster == 0]
df2 = df[df.cluster == 1]
df3 = df[df.cluster == 2]

plt.scatter(df1['petal length (cm)'], df1['petal width (cm)'], color = 'red')
plt.scatter(df2['petal length (cm)'], df2['petal width (cm)'], color = 'blue')
plt.scatter(df3['petal length (cm)'], df3['petal width (cm)'], color = 'green')

輸出：

Machine Learning（16） - 關於 K Means Clustering 的練習題

與資料集原本的 target 對比，檢視分類是否正確

df_new = pd.DataFrame(iris.data, columns = iris.feature_names)

df_new['target'] = iris.target

df_new.drop(['sepal length (cm)', 'sepal width (cm)'], axis = 'columns', inplace=True)

df_new.head()

輸出

Machine Learning（16） - 關於 K Means Clustering 的練習題

df_new1 = df_new[df_new.target == 0]
df_new2 = df_new[df_new.target == 1]
df_new3 = df_new[df_new.target == 2]

plt.scatter(df_new1['petal length (cm)'], df_new1['petal width (cm)'], color='red')
plt.scatter(df_new2['petal length (cm)'], df_new2['petal width (cm)'], color='blue')
plt.scatter(df_new3['petal length (cm)'], df_new3['petal width (cm)'], color='green')

下面是用資料集原本的 target 做的分類，與我們上面用 unsupervised 的方法得到的結果是一致的：

Machine Learning（16） - 關於 K Means Clustering 的練習題

以上，是對練習題的解題，因為上一小節已經對每步操作都做了詳細的解釋，所以這裡沒有過多重複，單純貼上解題過程，如果有不明白的，可以檢視上節的說明。

Machine Learning (11) - 關於 Decision Tree 的小練習
2019-06-09
Mac
Machine Learning (6) - 關於 Logistic Regression (Multiclass Classification) 的小練習
2019-04-14
Mac
Machine Learning (7) - 關於 Logistic Regression (Binary Classification) 的小練習
2019-06-07
Mac
Machine Learning (9) - 關於 Logistic Regression (Multiclass Classification) 的小練習
2019-06-08
Mac
論文解讀（DCN）《Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering》
2022-06-28
Machine Learning（14） - K Fold Cross Validation
2019-06-18
MacROS
吳恩達《Machine Learning》精煉筆記 6：關於機器學習的建議
2021-01-16
吳恩達Mac筆記機器學習
《深度學習》PDF Deep Learning: Adaptive Computation and Machine Learning series
2019-12-17
深度學習APTMac
k means
2018-09-17
《machine learning》引言
2020-10-13
Mac
Machine Learning with Sklearn
2020-12-11
Mac
Machine Learning（機器學習）之二
2018-10-25
Mac機器學習
Machine Learning（機器學習）之一
2019-02-27
Mac機器學習
使用Octave來學習Machine Learning(二)
2019-02-27
Mac
Machine Learning 機器學習筆記
2018-03-27
Mac機器學習筆記
Machine Learning (12) - Support Vector Machine (SVM)
2019-06-10
Mac
Matlab機器學習3（Machine Learning Onramp）
2020-10-27
Matlab機器學習Mac
Machine Learning－Introduction
2019-04-03
Mac
Machine Learning - Basic points
2020-01-17
Mac
The Next Step for Machine Learning 機器學習落地需攻破的9個難題
2019-02-26
Mac機器學習
Machine Learning (1) - Linear Regression
2019-04-14
Mac
Extreme Learning Machine 翻譯
2019-01-20
REMMac
pages bookmarks for machine learning domain
2018-12-05
MacAI
Machine Learning（13）- Random Forest
2019-06-12
MacrandomREST
Machine Learning (10) - Decision Tree
2019-06-09
Mac
Machine learning terms_01
2021-04-07
Mac
【機器學習】K-means聚類分析
2022-06-30
機器學習聚類
k-means聚類
2023-01-30
聚類
Machine Learning (5) - Training and Testing Data
2019-06-06
MacAI
SciTech-BigDataAIML-Machine Learning Tutorials
2024-08-12
AIMac
python相關練習題
2024-04-29
Python
無監督學習-K-means演算法
2022-04-05
演算法
AI學習筆記之——如何理解機器學習(Machine Learning)
2018-07-23
AI筆記機器學習Mac
關於程式設計的基本練習
2020-10-24
程式設計
關於重構的一點練習
2020-10-04
Auto Machine Learning 自動化機器學習筆記
2019-08-13
Mac機器學習筆記
Machine Learning Yearning 要點筆記
2018-10-24
Mac筆記
Machine Learning (6) - Logistic Regression (Binary Classification)
2019-06-07
Mac

Machine Learning（16） - 關於 K Means Clustering 的練習題

題目

解題

與資料集原本的 target 對比，檢視分類是否正確

相關文章