train_test_split()

빅데이터 관련 자료/Dacon

train_test_split() - (2)

carpe08 2021. 11. 21. 00:03

320x100

train_test_split()을 이용한 train / validation(test) 분리

train_test_split 함수를 통해 단 1줄로 깔끔하게 분리할 수 있습니다.

패키지는 sklearn.model_selection에 있습니다.

train_test_split 메소드는 총 4개의 값을 반환하고 2개의 값을 필수적으로 채워주어야 합니다.

반환 하는 값은 학습에 사용할 X,y 값, 검증에 사용할 X,y 값 이렇게 총 4개의 값을 반환하고, 입력값으로는 원본 데이터의 X,y을 채워주어야 합니다.

# 라이브러리 로딩
from sklearn.model_selection import train_test_split



x_train,x_valid, y_train, y_valid = train_test_split(train_x,train['category'])



print('x_train 데이터 사이즈', x_train.shape)

print('x_valid 데이터 사이즈', x_valid.shape)

print('y_train 데이터 사이즈', y_train.shape)

print('y_valid 데이터 사이즈', y_valid.shape)





output : 

x_train 데이터 사이즈 (30000, 697226)

x_valid 데이터 사이즈 (10000, 697226)

y_train 데이터 사이즈 (30000,)

y_valid 데이터 사이즈 (10000,)

320x100

저작자표시 비영리 변경금지

'빅데이터 관련 자료 > Dacon' 카테고리의 다른 글

train_test_split - (4) (0)	2021.11.23
train_test_split() - (3) (0)	2021.11.22
train_test_split - (1) (0)	2021.11.20
TF-IDF(Term Frequency - Inverse Document Frequency) - (2) (0)	2021.11.19
TF-IDF(Term Frequency - Inverse Document Frequency) - (1) (0)	2021.11.18

현재글train_test_split() - (2)

현업 데이터 분석가의 실무 및 일상 Story

250x250

KSQL, 파이썬, SQL, Oracle SQL, hadoop, 다중공선성, lgbm, train_test_split, kubernetes, html 기초, 해커랭크, 하이퍼파라미터, hackerrank, 데이터분석, SQL 기초, Python, 머신러닝, PySpark, 파이썬 기초, 데이콘,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

이서

train_test_split() - (2)

train_test_split()을 이용한 train / validation(test) 분리

'빅데이터 관련 자료 > Dacon' 카테고리의 다른 글

'빅데이터 관련 자료/Dacon'의 다른글

티스토리툴바

train_test_split() - (2)

train_test_split()을 이용한 train / validation(test) 분리

'빅데이터 관련 자료 > Dacon' 카테고리의 다른 글

'빅데이터 관련 자료/Dacon'의 다른글

관련글

티스토리툴바