다중 분류 문제(빅데이터 분석기사 실기 2유형)

IT 2023. 11. 28. 22:38

728x90

문제 : 'Segmentation' 분류 예측

1. 데이터 불러오기

import pandas as pd
train = pd.read_csv("../input/train.csv")
test = pd.read_csv("../input/test.csv")

2. ID 삭제, target과 test_id 분리

train = train.drop('ID', axis=1)
target = train.pop('Segmentation')
test_id = test.pop('ID')

3. 원-핫 인코딩

c_train = train.select_dtypes(include='object')
c_test = test.select_dtypes(include='object')
n_train = train.select_dtypes(exclude='object')
n_test =test.select_dtypes(exclude='object')


c_train = pd.get_dummies(c_train)
c_test = pd.get_dummies(c_test)

train = pd.concat([n_train,c_train], axis=1)
test = pd.concat([n_test, c_test], axis=1)

4. 훈련 검증 데이터 분리

from sklearn.model_selection import train_test_split
X_tr, X_val, y_tr, y_val = train_test_split(train, target, 
                                           test_size=0.15, random_state=788)

5. 모델 구축

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=1000, max_depth=10, random_state=887)
rf.fit(X_tr, y_tr)
pred = rf.predict(X_val)
pred_proba=rf.predict_proba(X_val)

6. 예측 점수 산출

from sklearn.metrics import accuracy_score, roc_auc_score, f1_score
print(accuracy_score(y_val, pred),
      f1_score(y_val,pred,average='macro'),
      roc_auc_score(y_val, pred_proba, multi_class='ovr'))

7. 제출

pred = rf.predict(test)
pred = 
pd.DataFrame({'ID' : test_id, 'Segmentation': pred}).to_csv('submit.csv',
                                                           index=False)

자료 출처 : 퇴근후딴짓님 빅데이터분석기사 기출유형 문제를 기반으로 했습니다.

728x90

저작자표시 (새창열림)

'IT' 카테고리의 다른 글

빅데이터분석기사 7회 실기 시험 후기 기출문제 유형소개 강좌 추천 (122)	2023.12.03
파이썬으로 점추정, 구간추정 하기 (104)	2023.12.01
엑셀 빠른 실행 도구 모음 추가하고 사용하기 (0)	2023.11.22

ABOUT ME

트레버의 정글 생존기 트레버의 정글 생존기

문제 : 'Segmentation' 분류 예측

1. 데이터 불러오기

2. ID 삭제, target과 test_id 분리

3. 원-핫 인코딩

4. 훈련 검증 데이터 분리

5. 모델 구축

6. 예측 점수 산출

7. 제출

'IT' 카테고리의 다른 글

티스토리툴바

ABOUT ME

문제 : 'Segmentation' 분류 예측

1. 데이터 불러오기

2. ID 삭제, target과 test_id 분리

3. 원-핫 인코딩

4. 훈련 검증 데이터 분리

5. 모델 구축

6. 예측 점수 산출

7. 제출

'IT' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바