train_test_split()

IT/Dacon

train_test_split() - (3)

carpe08 2021. 12. 6. 00:44

train_test_split() 메소드의 test_size 파라미터와 shuffle 파라미터 에 대해 알아보겠다.

test_size = test data(validation data) 구성의 비율을 나타낸다. train_size의 옵션과 반대 관계에 있는 옵션 값이며, 주로 test_size 파라미터를 지정해준다. test_size = 0.2로 지정하면 전체 데이터 셋의 20%를 test(validation) 셋으로 지정하겠다는 의미입니다. default 값은 0.25입니다.

shuffle: 데이터를 split 하기 이전에 섞을지 말지 여부에 대해 지정해주는 파라미터입니다. default = True입니다.

# 라이브러리 로딩

from sklearn.model_selection import train_test_split



#train_test_split() 메소드를 이용해 train/validation 데이터 나누기 

# test_size = 0.2로 지정하여 데이터 셋 split



x_train,x_valid, y_train, y_valid = train_test_split(train_x,train['category'],test_size = 0.2)



print('x_train 데이터 사이즈', x_train.shape)

print('x_valid 데이터 사이즈', x_valid.shape)

print('y_train 데이터 사이즈', y_train.shape)

print('y_valid 데이터 사이즈', y_valid.shape)





output :

x_train 데이터 사이즈 (28000, 697226)

x_valid 데이터 사이즈 (12000, 697226)

y_train 데이터 사이즈 (28000,)

y_valid 데이터 사이즈 (12000,)

320x100

저작자표시 비영리 변경금지 (새창열림)

'빅데이터 관련 자료 > Dacon' 카테고리의 다른 글

train_test_split / LGBM (1) (0)	2021.12.08
train_test_split() - (4) (0)	2021.12.07
train_test_split() - (2) (0)	2021.12.05
하이퍼파라미터 튜닝 / grid search (0)	2021.11.27
파이썬 파라미터/하이퍼파라미터 (0)	2021.11.26

현재글train_test_split() - (3)

자주 소통해요~!

250x250

해커랭크, 지리, 머신러닝, 관세음보살, SQL, 불교 수행, 불교 명언, 데이터 분석, 명상, 부처님 말씀, 무상, hackerrank, 자비, 마음 수행, 사경, 불교, 파이썬, 불교 철학, Athena, Python,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

하미's 블로그

train_test_split() - (3)

'빅데이터 관련 자료 > Dacon' 카테고리의 다른 글

'IT/Dacon'의 다른글

티스토리툴바

train_test_split() - (3)

'빅데이터 관련 자료 > Dacon' 카테고리의 다른 글

'IT/Dacon'의 다른글

관련글

티스토리툴바