機器學習—聚類5-1(K-Means演算法+瑞士捲)

橘子橘子呀發表於2022-03-15

使用K-Means對超市客戶分組

主要步驟流程:

  • 1. 匯入包
  • 2. 匯入資料集
  • 3. 使用肘部法則選擇最優的K值
  • 4. 使用K=5做聚類
  • 5. 視覺化聚類效果
  • 6. 採取措施
  • 7. 瑞士捲生產及其聚類
 
 資料集連結:https://www.heywhale.com/mw/dataset/6230697d5f17950018ee88b5/file
 

1. 匯入包

In [1]:
# 匯入包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

 

2. 匯入資料集

In [2]:
# 匯入資料集
dataset = pd.read_csv('Mall_Customers.csv')
dataset
Out[2]:
 CustomerIDGenreAgeAnnual Income (k$)Spending Score (1-100)
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
... ... ... ... ... ...
195 196 Female 35 120 79
196 197 Female 45 126 28
197 198 Male 32 126 74
198 199 Male 32 137 18
199 200 Male 30 137 83

200 rows × 5 columns

為了視覺化聚類效果,僅選取Annual Income (k$)和Spending Score (1-100)這2個欄位

In [3]:
X = dataset.iloc[:, [3, 4]].values
X[:3, :]
Out[3]:
array([[15, 39],
       [15, 81],
       [16,  6]], dtype=int64)
 

3. 使用肘部法則選擇最優的K值

In [4]:
# 使用肘部法則選擇最優的K值
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters = i, init = 'k-means++', n_init=10, max_iter=300, random_state = 0)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)
In [5]:
# 畫出 聚類個數 vs WCSS 圖
plt.figure()
plt.plot(range(1, 11), wcss, 'ro-')
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

機器學習—聚類5-1(K-Means演算法+瑞士捲)

從K=5開始,WCSS下降的不再明顯,說明K=5是最優選擇

 

4. 使用K=5做聚類

In [6]:
# 使用選擇出的K,使用K-Means做聚類
kmeans = KMeans(n_clusters = 5, init = 'k-means++', n_init=10, max_iter=300, random_state = 0)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
In [7]:
y_kmeans
Out[7]:
array([3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1,
       3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 0,
       3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 4, 2, 0, 2, 4, 2, 4, 2,
       0, 2, 4, 2, 4, 2, 4, 2, 4, 2, 0, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
       4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
       4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
       4, 2])

5. 視覺化聚類效果

In [8]:
# 視覺化聚類效果
plt.figure()
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow', label = 'Centroids')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
機器學習—聚類5-1(K-Means演算法+瑞士捲)

 

6. 採取措施

  1. Cluster 1 工資收入中等,消費中等;
  2. Cluster 2 工資收入低,消費高,檢視這個分組主要購買哪些商品;
  3. Cluster 3 工資收入高,消費高;
  4. Cluster 4 工資收入低,消費低;
  5. Cluster 5 工資收入高,消費低,給這個分組的客戶辦理優惠券或打折購物卡,吸引他們消費;

 

7. 瑞士捲生產及其聚類

In [10]:
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import KMeans
from sklearn import manifold, datasets
import matplotlib.pyplot as plt
​

#生成帶噪聲的瑞士捲資料集
X,color = datasets.make_swiss_roll(n_samples=3000)
​
#使用100個K-means簇對資料進行近似
clusters_swiss_roll = KMeans(n_clusters=3,random_state=1).fit_predict(X)
​
fig2 = plt.figure(figsize=(10,10))
ax
= fig2.add_subplot(111,projection='3d') ax.scatter(X[:,0],X[:,1],X[:,2],c = clusters_swiss_roll,cmap = 'Spectral')
plt.show()

機器學習—聚類5-1(K-Means演算法+瑞士捲)

如上圖,根據距離將其聚成了3類,
 
 
 
 
 
 

相關文章