IT

다중 분류 문제(빅데이터 분석기사 실기 2유형)

Trevor90 2023. 11. 28. 22:38
728x90
반응형

thumbnail

문제 : 'Segmentation' 분류 예측

1. 데이터 불러오기

import pandas as pd
train = pd.read_csv("../input/train.csv")
test = pd.read_csv("../input/test.csv")



2. ID 삭제, target과 test_id 분리

train = train.drop('ID', axis=1)
target = train.pop('Segmentation')
test_id = test.pop('ID')

 

3. 원-핫 인코딩

c_train = train.select_dtypes(include='object')
c_test = test.select_dtypes(include='object')
n_train = train.select_dtypes(exclude='object')
n_test =test.select_dtypes(exclude='object')


c_train = pd.get_dummies(c_train)
c_test = pd.get_dummies(c_test)

train = pd.concat([n_train,c_train], axis=1)
test = pd.concat([n_test, c_test], axis=1)

 

4. 훈련 검증 데이터 분리

from sklearn.model_selection import train_test_split
X_tr, X_val, y_tr, y_val = train_test_split(train, target, 
                                           test_size=0.15, random_state=788)

 

5. 모델 구축

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=1000, max_depth=10, random_state=887)
rf.fit(X_tr, y_tr)
pred = rf.predict(X_val)
pred_proba=rf.predict_proba(X_val)

 

6. 예측 점수 산출

from sklearn.metrics import accuracy_score, roc_auc_score, f1_score
print(accuracy_score(y_val, pred),
      f1_score(y_val,pred,average='macro'),
      roc_auc_score(y_val, pred_proba, multi_class='ovr'))


7. 제출 

pred = rf.predict(test)
pred = 
pd.DataFrame({'ID' : test_id, 'Segmentation': pred}).to_csv('submit.csv',
                                                           index=False)

 

 

자료 출처 : 퇴근후딴짓님 빅데이터분석기사 기출유형 문제를 기반으로 했습니다.

728x90
반응형