主成分分析(PCA)Python程式碼實現

fengbingchun發表於2018-01-14

PCAPython

主成分分析(Principal Components Analysis, PCA)簡介可以參考： http://blog.csdn.net/fengbingchun/article/details/78977202

這裡是參照 http://sebastianraschka.com/Articles/2014_pca_step_by_step.html 文章中的code整理的Python程式碼，實現過程為：

1. 隨機生成3行*40列的資料集，每一列代表一個樣本，前20列屬於類1，後20列屬於類2；每一個樣本特徵長度為3；

2. 計算每行均值；

3. 計算協方差矩陣，產生一個3行*3列的矩陣；

4. 由協方差矩陣計算特徵向量和特徵值；

5. 按降序排序特徵值和特徵向量；

6. 選擇第一主成分和第二主成分組成一個新的3行*2列的矩陣；

7. 根據產生的3行*2列矩陣重建原有資料集。

Python程式碼如下：

# reference: http://sebastianraschka.com/Articles/2014_pca_step_by_step.html

import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import proj3d
from matplotlib.patches import FancyArrowPatch

# 1. generate 40 3-dimensional samples randomly drawn from a multivariate Gaussian distribution
np.random.seed(1) # random seed for consistency

mu_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20).T
assert class1_sample.shape == (3,20), "The matrix has not the dimensions 3x20"
#print("class1_sample:\n", class1_sample)

mu_vec2 = np.array([1,1,1])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, 20).T
assert class2_sample.shape == (3,20), "The matrix has not the dimensions 3x20"

fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
plt.rcParams['legend.fontsize'] = 10
ax.plot(class1_sample[0,:], class1_sample[1,:], class1_sample[2,:], 'o', markersize=8, color='blue', alpha=0.5, label='class1')
ax.plot(class2_sample[0,:], class2_sample[1,:], class2_sample[2,:], '^', markersize=8, alpha=0.5, color='red', label='class2')

plt.title('Samples for class 1 and class 2')
ax.legend(loc='upper right')
plt.show()

# Taking the whole dataset ignoring the class labels
all_samples = np.concatenate((class1_sample, class2_sample), axis=1)
assert all_samples.shape == (3,40), "The matrix has not the dimensions 3x40"

# 2. Computing the d-dimensional mean vector
mean_x = np.mean(all_samples[0,:])
mean_y = np.mean(all_samples[1,:])
mean_z = np.mean(all_samples[2,:])

mean_vector = np.array([[mean_x],[mean_y],[mean_z]])
print('Mean Vector:\n', mean_vector)

# 3. Computing the Covariance Matrix
cov_mat = np.cov([all_samples[0,:],all_samples[1,:],all_samples[2,:]])
print('Covariance Matrix:\n', cov_mat)

# 4. Computing eigenvectors and corresponding eigenvalues
eig_val_cov, eig_vec_cov = np.linalg.eig(cov_mat)

for i in range(len(eig_val_cov)):
	eigvec_cov = eig_vec_cov[:,i].reshape(1,3).T

	print('Eigenvector {}: \n{}'.format(i+1, eigvec_cov))
	print('Eigenvalue {} from covariance matrix: {}'.format(i+1, eig_val_cov[i]))
	print(40 * '-')

# Visualizing the eigenvectors
class Arrow3D(FancyArrowPatch):
	def __init__(self, xs, ys, zs, *args, **kwargs):
		FancyArrowPatch.__init__(self, (0,0), (0,0), *args, **kwargs)
		self._verts3d = xs, ys, zs

	def draw(self, renderer):
		xs3d, ys3d, zs3d = self._verts3d
		xs, ys, zs = proj3d.proj_transform(xs3d, ys3d, zs3d, renderer.M)
		self.set_positions((xs[0],ys[0]),(xs[1],ys[1]))
		FancyArrowPatch.draw(self, renderer)

fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111, projection='3d')

ax.plot(all_samples[0,:], all_samples[1,:], all_samples[2,:], 'o', markersize=8, color='green', alpha=0.2)
ax.plot([mean_x], [mean_y], [mean_z], 'o', markersize=10, color='red', alpha=0.5)
for v in eig_vec_cov.T:
	a = Arrow3D([mean_x, v[0]], [mean_y, v[1]], [mean_z, v[2]], mutation_scale=20, lw=3, arrowstyle="-|>", color="r")
	ax.add_artist(a)
ax.set_xlabel('x_values')
ax.set_ylabel('y_values')
ax.set_zlabel('z_values')

plt.title('Eigenvectors')
plt.show()

# 5 Sorting the eigenvectors by decreasing eigenvalues
# Make a list of (eigenvalue, eigenvector) tuples
eig_pairs = [(np.abs(eig_val_cov[i]), eig_vec_cov[:,i]) for i in range(len(eig_val_cov))]

# Sort the (eigenvalue, eigenvector) tuples from high to low
eig_pairs.sort(key=lambda x: x[0], reverse=True)
print("eig_pairs:\n", eig_pairs)

# Visually confirm that the list is correctly sorted by decreasing eigenvalues
print("sorted by decreasing eigenvalues:")
for i in eig_pairs:
	print(i[0])

# 6 Choosing k eigenvectors with the largest eigenvalues
matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1), eig_pairs[1][1].reshape(3,1)))
print('Matrix W:\n', matrix_w)

# 7 Transforming the samples onto the new subspace
transformed = matrix_w.T.dot(all_samples)
assert transformed.shape == (2,40), "The matrix is not 2x40 dimensional."

plt.plot(transformed[0,0:20], transformed[1,0:20], 'o', markersize=7, color='blue', alpha=0.5, label='class1')
plt.plot(transformed[0,20:40], transformed[1,20:40], '^', markersize=7, color='red', alpha=0.5, label='class2')
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.xlabel('x_values')
plt.ylabel('y_values')
plt.legend()
plt.title('Transformed samples with class labels')
plt.show()

執行結果如下：

GitHub： https://github.com/fengbingchun/NN_Test

運用sklearn進行主成分分析(PCA)程式碼實現
2020-08-12
PCA
主成分分析（PCA）
2020-04-28
PCA
PCA主成分分析（上）
2020-05-01
PCA
主成分分析（PCA）原理詳解
2018-09-15
PCA
手把手 | 用StackOverflow訪問資料實現主成分分析（PCA）
2018-05-30
PCA
聊聊基於Alink庫的主成分分析(PCA)
2023-10-03
PCA
演算法金 | 再見，PCA 主成分分析！
2024-06-04
演算法PCA
主成分分析及其matlab實現
2024-07-10
Matlab
【數學】主成分分析（PCA）的詳細深度推導過程
2024-04-23
PCA
【機器學習】--主成分分析PCA降維從初識到應用
2018-04-10
機器學習PCA
特徵向量/特徵值/協方差矩陣/相關/正交/獨立/主成分分析/PCA/
2018-08-14
特徵矩陣PCA
opencv——PCA（主要成分分析）數學原理推導
2021-05-28
OpenCVPCA
Python數模筆記-Sklearn（3）主成分分析
2021-05-11
Python筆記
主成分分析推導
2018-05-13
資料分析處理之PCA OLSR PCR PLSR(NIPALS)及其Matlab程式碼實現
2021-05-06
PCAMatlab
R語言邏輯迴歸、GAM、LDA、KNN、PCA主成分分類分析預測房價及交叉驗證
2024-03-04
R語言邏輯迴歸GAMLDAKNNPCA
降維方法主成分分析和因子分析
2024-11-30
acw_sc__v2引數生成分析並python實現演算法
2023-01-06
Python演算法
Python課程程式碼實現
2018-09-21
Python
利用python的KMeans和PCA包實現聚類演算法
2019-09-15
PythonPCA聚類演算法
材料成分分析
2020-12-05
吳恩達《Machine Learning》精煉筆記 9：PCA 及其 Python 實現
2021-01-23
吳恩達Mac筆記PCAPython
Python程式碼實現“FlappyBird”小遊戲
2020-12-10
PythonAPP遊戲
通俗易懂解釋什麼是PCIA(主成分分析) - stackexchange
2021-09-29
搶紅包案例分析以及程式碼實現
2018-11-05
執行緒池的實現程式碼分析
2024-06-09
執行緒
虛擬主機php程式碼實現強制https
2021-01-31
PHPHTTP
opencv PCA 主軸方向角度範圍
2024-10-27
OpenCVPCA
一行 Python 程式碼實現並行
2019-03-16
Python並行
20 行 Python 程式碼實現加密通訊
2022-11-24
Python加密
【Python】用Python實現一個簡單的執行緒池模型效果程式碼分析講解
2019-06-27
Python執行緒模型
基於MySql主從分離的程式碼層實現
2020-07-28
MySql
搶紅包案例分析以及程式碼實現（三）
2018-11-11
搶紅包案例分析以及程式碼實現（二）
2018-11-07
搶紅包案例分析以及程式碼實現（四）
2018-11-19
深入理解python虛擬機器：偵錯程式實現原理與原始碼分析
2023-04-26
Python虛擬機原始碼
1行Python程式碼實現FTP伺服器
2018-03-15
PythonFTP伺服器
70 行 python 程式碼實現桌布批量下載
2019-01-30
Python
例項程式碼分享Python實現Linux監控
2021-07-18
PythonLinux

主成分分析(PCA)Python程式碼實現

相關文章