ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • 다중 분류 문제(빅데이터 분석기사 실기 2유형)
    IT 2023. 11. 28. 22:38
    728x90
    반응형

    thumbnail

    문제 : 'Segmentation' 분류 예측

    1. 데이터 불러오기

    import pandas as pd
    train = pd.read_csv("../input/train.csv")
    test = pd.read_csv("../input/test.csv")



    2. ID 삭제, target과 test_id 분리

    train = train.drop('ID', axis=1)
    target = train.pop('Segmentation')
    test_id = test.pop('ID')

     

    3. 원-핫 인코딩

    c_train = train.select_dtypes(include='object')
    c_test = test.select_dtypes(include='object')
    n_train = train.select_dtypes(exclude='object')
    n_test =test.select_dtypes(exclude='object')
    
    
    c_train = pd.get_dummies(c_train)
    c_test = pd.get_dummies(c_test)
    
    train = pd.concat([n_train,c_train], axis=1)
    test = pd.concat([n_test, c_test], axis=1)

     

    4. 훈련 검증 데이터 분리

    from sklearn.model_selection import train_test_split
    X_tr, X_val, y_tr, y_val = train_test_split(train, target, 
                                               test_size=0.15, random_state=788)

     

    5. 모델 구축

    from sklearn.ensemble import RandomForestClassifier
    rf = RandomForestClassifier(n_estimators=1000, max_depth=10, random_state=887)
    rf.fit(X_tr, y_tr)
    pred = rf.predict(X_val)
    pred_proba=rf.predict_proba(X_val)

     

    6. 예측 점수 산출

    from sklearn.metrics import accuracy_score, roc_auc_score, f1_score
    print(accuracy_score(y_val, pred),
          f1_score(y_val,pred,average='macro'),
          roc_auc_score(y_val, pred_proba, multi_class='ovr'))


    7. 제출 

    pred = rf.predict(test)
    pred = 
    pd.DataFrame({'ID' : test_id, 'Segmentation': pred}).to_csv('submit.csv',
                                                               index=False)

     

     

    자료 출처 : 퇴근후딴짓님 빅데이터분석기사 기출유형 문제를 기반으로 했습니다.

    728x90
    반응형
Designed by Tistory.