hatunina’s blog

メモと日記です

KFoldでクロスバリデーション

メモです。

サンプル

とりあえずndarrayを定義

import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([1, 2, 3, 4])

print(X)
print('=========')
print(y)

# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
# =========
# [1 2 3 4]


n_splitsで区切る数を指定する。デフォルトは3とのこと。
他にもshuffle, random_stateが指定できる。
split関数で対象のデータセットを渡すとindexndarrayで返ってくるのでそれを使ってデータセットを分けていく。

from sklearn.model_selection import KFold

kf = KFold(n_splits=3)

for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    print('--- X_train ---')
    print(X_train)
    print('--- X_test ---')
    print(X_test)
    print('--- y_train ---')
    print(y_train)
    print('--- y_test ---')
    print(y_test)
    print('=== next split ===‘)

# --- X_train ---
# [[5 6]
# [7 8]]
# --- X_test ---
# [[1 2]
# [3 4]]
# --- y_train ---
# [3 4]
# --- y_test ---
# [1 2]
# === next split ===
# --- X_train ---
# [[1 2]
# [3 4]
# [7 8]]
# --- X_test ---
# [[5 6]]
# --- y_train ---
# [1 2 4]
# --- y_test ---
# [3]
# === next split ===
# --- X_train ---
# [[1 2]
# [3 4]
# [5 6]]
# --- X_test ---
# [[7 8]]
# --- y_train ---
# [1 2 3]
# --- y_test ---
# [4]
# === next split ===


参考

sklearn.model_selection.KFold — scikit-learn 0.19.1 documentation