train_test_split

作用

将数据集切分为测试集和训练集

用法

from sklearn.model_selection import train_test_split

def train_test_split(
    *arrays,
    test_size=None,
    train_size=None,
    random_state=None,
    shuffle=True,
    stratify=None,
)

参数

*arrays（数量可变）

具有相同长度/形状的可索引序列，
可以是列表、numpy 数组、scipy-sparse 矩阵或以上结构的pandas数据类型

test_size

如果是float，应该在0.0到1.0之间，代表比例要包含在测试拆分中的数据集。如果是 int，则代表测试样本的绝对数量。如果没有，则该值设置为训练数据大小。如果 train_size 也是 None，它会设置为 0.25。

train_size

同test_size

random_state

shuffle

洗牌
如果shuffle是False，那么stratify也要是None

stratify

返回值

两个数组

versionadded:: 0.16
   If the input is sparse, the output will be a
   ``scipy.sparse.csr_matrix``. Else, output type is the same as the
   input type

例子

>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]

>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)
...
>>> X_train
array([[4, 5],
       [0, 1],
       [6, 7]])
>>> y_train
[2, 0, 3]
>>> X_test
array([[2, 3],
       [8, 9]])
>>> y_test
[1, 4]

>>> train_test_split(y, shuffle=False)
[[0, 1, 2], [3, 4]]

"""

sklearn使用函数文档

train_test_split

作用

用法

参数

*arrays（数量可变）

test_size

train_size

random_state

shuffle

stratify

返回值

例子