【模型評估與選擇】sklearn.model_selection.KFold

Datawhale發表於2018-07-03

1. 描述
KFold divides all the samples in k groups of samples, called folds (if k = n, this is equivalent to the Leave One Out strategy), of equal sizes (if possible). The prediction function is learned using k - 1 folds, and the fold left out is used for test.

2. 語法
sklearn.model_selection.KFold(n_splits=3, shuffle=False, random_state=None)

3. 引數:
1. n_splits: int, default=3
Number of folds. Must be at least 2.
將訓練/測試資料集劃分n_splits個互斥子集，每次用n_splits-1個子集的並集作為訓練集，餘下的子集作為測試集
2. shuffle:boolean, optional
Whether to shuffle the data before splitting into batches.
shuffle= False:不洗牌，每次執行結果相同，相當於random_state=整數
shuffle=True:洗牌，每次執行結果不同
shuffle=True和random_state=整數: 每次執行結果相同
3. random_state: int, RandomState instance or None, optional, default=None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.

4. 方法
1. get_n_splits([X, y, groups])
Returns the number of splitting iterations in the cross-validator
2. split(X[, y, groups])
Generate indices to split data into training and test set.

注：Randomized CV splitters may return different results for each call of split. You can make the results identical by setting random_state to an integer.

5. 例項

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = KFold(n_splits=3)
kf.get_n_splits(X)

print(kf)  

for train_index, test_index in kf.split(X):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]
print(X_train,y_train)
print(X_test,y_test)

KFold(n_splits=3, random_state=None, shuffle=False)
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1 3] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
[[1 2]
[3 4]
[1 2]] [1 2 3]
[[3 4]] [4]

注：
The first n_samples % n_splits folds have size n_samples // n_splits + 1, other folds have size n_samples // n_splits, where n_samples is the number of samples.

擴充
StratifiedKFold
Takes group information into account to avoid building folds with imbalanced class distributions (for binary or multiclass classification tasks).
GroupKFold
K-fold iterator variant with non-overlapping groups.
RepeatedKFold
Repeats K-Fold n times.

【模型評估與選擇】交叉驗證Cross-validation: evaluating estimator performance
2018-07-03
模型ROSORM
機器學習-學習筆記(二) --> 模型評估與選擇
2022-06-09
機器學習筆記模型
如何選擇評估 JS 庫
2019-04-12
JS
機器學習演算法筆記之7：模型評估與選擇
2020-04-06
機器學習演算法筆記模型
【機器學習】第二節-模型評估與選擇-效能度量、方差與偏差、比較檢驗
2024-05-17
機器學習模型
模型評估與改進：交叉驗證
2022-05-26
模型
正則化與模型選擇
2019-01-25
模型
周志華西瓜書《機器學習筆記》學習筆記第二章《模型的評估與選擇》
2018-12-02
機器學習筆記模型
模型選擇
2024-05-22
模型
演算法金 | 機器學習模型評價、模型與演算法選擇（綜述）
2024-06-02
演算法機器學習模型
1、擬合、預測、估算器、管道與模型評估
2024-08-22
模型
迴歸模型-評估指標
2018-06-02
模型指標
如何評估大語言模型
2023-03-29
模型
使用 Amazon Bedrock（預覽版），評估、比較和選擇適合您的用例的基礎模型
2024-01-03
模型
SSS 2.3根據記錄的評估標準評估提議的解決方案，並選擇供應商
2021-04-12
機器學習之模型評估
2019-06-21
機器學習模型
uni-app選型評估指南
2022-06-22
APP
如何評估RPA需求，RPA需求的模型
2019-11-11
模型
GNN 模型評估的一些陷阱
2019-12-20
GNN模型
GNN模型評估的一些陷阱
2019-12-18
GNN模型
一文解碼語言模型：語言模型的原理、實戰與評估
2023-11-13
模型
資料庫效能需求分析及評估模型
2018-05-14
資料庫模型
小程式框架選擇與平臺編譯能力測評
2022-02-16
框架編譯
手搓大模型Task04：如果評估你的大模型
2024-10-02
大模型
MIS607網路安全評估威脅模型
2024-03-28
模型
聊聊使用FURPS模型做資料庫選型評估中的一些問題
2023-03-07
模型資料庫
超融合架構與產品選型的選型評估過程及實施方案
2020-06-03
架構
11_二值選擇模型
2024-05-03
模型
Spark 模型選擇和調參
2020-09-28
Spark模型
機器學習筆記——模型選擇與正則化
2020-10-17
機器學習筆記模型
如何開發和評估Vanilla LSTM模型？（附程式碼）
2019-01-14
模型
一文詳盡系列之模型評估指標
2020-02-15
模型指標
長上下文語言模型評估體系探析
2024-11-29
模型
模型評估過程中：命中率/覆蓋率
2021-09-09
模型
說說你對RAIL效能評估模型的瞭解
2024-12-11
AI模型
自我評估
2024-09-06
網路IO模型-非同步選擇模型(Delphi版)
2020-10-14
模型非同步
決策樹模型(2)特徵選擇
2024-03-26
模型特徵

【模型評估與選擇】sklearn.model_selection.KFold

相關文章