Python機器學習——如何shuffle一個資料集（ndarray型別）

Inside_Zhang發表於2015-11-26

取 shuffle 的過程其實可看做從全排列中隨機選擇一個的過程。

稍微比較麻煩的是實現特徵向量與類別標籤的同步shuffle。

如果這裡資料集既包含特徵向量又包括標籤值，可直接呼叫np.random.shuffle()方法：

>>> np.random.shuffle(training_data)
>>> X = training_data[:, :-1]
>>> y = trianing_data[:, -1]

如果已對資料集的特徵向量與類別標籤列進行分離，這裡提供兩種方式進行同步shuffle：

1. 使用 np.random.shuffle()

X, y 同時進行 shuffle

>>> training_data = np.hstack(X, y)
>>> np.random.shuffle(training_data)
>>> X = training_data[:, :-1]
>>> y = training_data[:, -1]

根據 indices 進行shuffle

indices = np.arange(X.shape[0])
np.random.shuffle(indices)
X = X[indices]
y = y[indices]

2. np.random.permutation() 對行索引進行一次全排列

>>> r = np.random.permutation(len(y))
								# 隨機地從全排列中選取一個，實現 shuffle
>>> X = X[r, :] 
>>> y = [r]

3. shuffle

上文所述皆是對矩陣的行（樣本矩陣的每一個樣本）進行 shuffle，那麼該如何 shuffle 矩陣的列呢（也即樣本矩陣的每一個屬性）。

仍然存在兩種方案，

np.random.shuffle()

因為 np.random.shuffle() 直接對原始矩陣進行修改（返回值為NoneType），且不接受另外的引數，我們可對原始矩陣的轉置 shuffle 之後，再轉置

>>> training_data = np.hstack(X, y)
>>> training_data = training_data.T
>>> np.random.shuffle(training_data)
>>> training_data = training_data.T
>>> X = training_data[:, :-1]
>>> y = training_data[:, -1]

np.random.permutation() 對列索引進行一次全排列

# 此時 y 參與 shuffle
>>> r = np.random.permutation(X.shape[1])
>>> X[:, r]

資料集shuffle的重要性
2019-02-19
33個機器學習常用資料集
2019-04-19
機器學習
Python資料型別面試題集錦！（附答案）
2021-04-21
Python資料型別面試題
初學Python（一）——資料型別
2016-07-31
Python資料型別
最強資料集集合：50個最佳機器學習公共資料集
2018-11-08
機器學習
Python資料型別
2024-06-05
Python資料型別
資料型別和字符集
2019-01-19
資料型別
mysql 資料型別，字符集
2013-07-01
MySQL 資料型別
Python的資料型別都有哪些？如何使用？
2022-06-28
Python資料型別
一、資料型別
2020-10-27
資料型別
Python之資料型別
2020-04-04
Python資料型別
python自學——資料型別
2021-02-08
Python資料型別
python基本資料型別
2018-12-22
Python資料型別
Python資料型別3
2020-12-26
Python資料型別
Python學習筆記(一) 資料型別
2018-08-01
Python筆記資料型別
##如何獲得一個yyyy-MM-dd型別的Date資料
2019-01-04
型別
Python基本資料型別：布林型別(Boolean)
2019-01-05
Python資料型別Boolean
資料型別是什麼?Python的資料型別又有哪些？
2022-04-29
資料型別Python
一個真實資料集的完整機器學習解決方案（上）
2020-12-15
機器學習
一個真實資料集的完整機器學習解決方案（下）
2020-12-18
機器學習
Scala（一）資料型別
2020-12-18
資料型別
33 個 JavaScript 核心概念系列（一）: 資料型別
2019-01-05
JavaScript資料型別
一個類資料型別的STL例子 (轉)
2008-01-29
資料型別
如何使用enum資料型別？
2022-01-20
資料型別
【Python】組合資料型別
2020-05-31
Python資料型別
Python支援哪些資料型別
2021-09-11
Python資料型別
python筆記--資料型別
2021-09-09
Python筆記資料型別
Python內建資料型別
2021-04-05
Python資料型別
Python常用的資料型別
2018-09-03
Python資料型別
Python的基本資料型別
2019-01-05
Python資料型別
Python資料型別基礎
2020-12-22
Python資料型別
Python資料型別——列表（List）
2020-10-02
Python資料型別
python的資料型別（集合）
2020-09-28
Python資料型別
Python資料型別之列表
2017-06-27
Python資料型別
Python 基礎資料型別
2018-01-04
Python資料型別
python資料型別轉換
2013-06-27
Python資料型別
(三)Python基本資料型別
2024-07-26
Python資料型別
python解析c型別資料
2024-08-20
Python型別

Python機器學習——如何shuffle一個資料集（ndarray型別）

1. 使用 np.random.shuffle()

2. np.random.permutation() 對行索引進行一次全排列

3. shuffle

相關文章