๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Machine Learning

[Machine Learning] XGBoost (Extreme Gradient Boosting)

๋ฐ˜์‘ํ˜•

๐Ÿ“Œ Boosting

- ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์•ฝํ•œ Decision Tree๋ฅผ ์กฐํ•ฉํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š” Ensemble ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜

- ์•ฝํ•œ ์—์ธก ๋ชจํ˜•๋“ค์˜ ํ•™์Šต ์—๋Ÿฌ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋‘๊ณ , ์ˆœ์ฐจ์ ์œผ๋กœ ๋‹ค์Œ ํ•™์Šต ๋ชจ๋ธ์— ๋ฐ˜์˜ํ•˜์—ฌ ๊ฐ•ํ•œ ์˜ˆ์ธก๋ชจํ˜•์„ ๋งŒ๋“œ๋Š” ๊ฒƒ

Tree Ensemble ๋ชจ๋ธ ์˜ˆ์‹œ

๐Ÿ“ŒGradient Boosting

- ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(gradient descent)์„ ์‚ฌ์šฉํ•ด ์ž”์—ฌ ์˜ค์ฐจ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ

- ์ž˜๋ชป๋œ ์˜ˆ์ธก์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์กฐ์ •ํ•˜์—ฌ ์ƒˆ๋กœ์šด ํŠธ๋ฆฌ๋ฅผ ๋งŒ๋“ฆ (๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฒฐ์ ์„ ๋ณด์™„ํ•˜๋Š” ๋งค๋ ฅ์ ์ธ ๋Œ€์•ˆ)

- ์ƒˆ๋กœ์šด ํŠธ๋ฆฌ๋Š” ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์˜ˆ์ธก๋œ ๊ฐ’์—๋Š” ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š”๋‹ค

- ์˜ค์ฐจ์—๋งŒ ์ดˆ์ ์„ ๋งž์ถ”๋Š” ML ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋งŒ๋“œ๋ ค๋ฉด ์ •ํ™•ํ•œ ์ตœ์ข… ์˜ˆ์ธก์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ• ํ•„์š”

  ๋”ฐ๋ผ์„œ, ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ณผ ์‹ค์ œ ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด์ธ ์ž”์ฐจ(residual)๋ฅผ ํ™œ์šฉ

- ๊ฐ ํŠธ๋ฆฌ ์˜ˆ์ธก ๊ฐ’์„ ๋”ํ•ด ๋ชจ๋ธ ํ‰๊ฐ€์— ์‚ฌ์šฉํ•œ๋‹ค.

 

Gradient Tree Boosting

โœ‹ XGBoost (Extreme Gradient Boosting)

- Extreme : '์ •ํ™•๋„',  '์†๋„' ๋ฅผ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„์‚ฐ๋Ÿ‰์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ

- XGBoost Gradient Boosting์„ ํฌ๊ฒŒ ์—…๊ทธ๋ ˆ์ด๋“œ ํ•œ ๋ชจ๋ธ

  ๋”ฐ๋ผ์„œ, XGBoost ์˜ ์žฅ์ ์„ ์ดํ•ดํ•˜๋ ค๋ฉด Gradient Boosting์˜ ์ž‘๋™ ๋ฐฉ์‹์„ ์•Œ์•„์•ผ ํ•จ.

- ์ž”์ฐจ๋กœ๋ถ€ํ„ฐ ํ›ˆ๋ จํ•œ ํŠธ๋ฆฌ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์•ฝํ•œ ํ•™์Šต๊ธฐ๋ฅผ ๊ฐ•๋ ฅํ•œ ํ•™์Šต๊ธฐ๋กœ ๋ฐ”๊พผ๋‹ค

 

๐Ÿ’ก GBM ๋Œ€๋น„ ๋น ๋ฅธ ์ˆ˜ํ–‰์‹œ๊ฐ„  -  ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋กœ ํ•™์Šต, ๋ถ„๋ฅ˜ ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค

๐Ÿ’ก ๊ณผ์ ํ•ฉ ๊ทœ์ œ (Regularization)

  - ํ‘œ์ค€ GBM์˜ ๊ฒฝ์šฐ ๊ณผ์ ํ•ฉ ๊ทœ์ œ๊ธฐ๋Šฅ์ด ์—†์œผ๋‚˜, XGBoost๋Š” ์ž์ฒด์— ๊ณผ์ ํ•ฉ ๊ทœ์ œ ๊ธฐ๋Šฅ์œผ๋กœ ๊ฐ•ํ•œ ๋‚ด๊ตฌ์„ฑ ์ง€๋‹Œ๋‹ค

๐Ÿ’ก ๋ถ„๋ฅ˜์™€ ํšŒ๊ท€์˜์—ญ์—์„œ ๋›ฐ์–ด๋‚œ ์˜ˆ์ธก ์„ฑ๋Šฅ ๋ฐœํœ˜

  - CART (Classification And Regression Tree) ์•™์ƒ๋ธ” ๋ชจ๋ธ ์‚ฌ์šฉ

๐Ÿ’ก ์กฐ๊ธฐ ์ข…๋ฃŒ (Early Stopping) ๊ธฐ๋Šฅ ์žˆ์Œ

๐Ÿ’ก ๋‹ค์–‘ํ•œ ์˜ต์…˜์„ ์ œ๊ณต, Customizing์ด ์šฉ์ด

โœ‹ XGBRegressor ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์˜ˆ์‹œ

 XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
                 colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
                 importance_type='gain', interaction_constraints='',
                 learning_rate=0.1, max_delta_step=0, max_depth=5,
                 min_child_weight=1, missing=nan, monotone_constraints='()',
                 n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,
                 reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
                 tree_method='exact', validate_parameters=1, verbosity=None)

โœ‹ XGBoost Parameter

๐Ÿ‘€ ์ผ๋ฐ˜ ํŒŒ๋ผ๋ฏธํ„ฐ

  -   Boosting ์ˆ˜ํ–‰ํ•  ๋•Œ ํŠธ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ• ์ง€, ์„ ํ˜• ๋ชจ๋ธ์„ ์‚ฌ์šฉํ• ์ง€ ๋“ฑ์„ ๊ณ ๋ฆ„

 

  ๐Ÿ‘‰ booster [๊ธฐ๋ณธ๊ฐ’ = gbtree]

   -  ์–ด๋–ค ๋ถ€์Šคํ„ฐ ๊ตฌ์กฐ๋ฅผ ์“ธ์ง€ ๊ฒฐ์ •ํ•œ๋‹ค.

   -  ์˜์‚ฌ๊ฒฐ์ •๊ธฐ๋ฐ˜๋ชจํ˜•(gbtree), ์„ ํ˜•๋ชจํ˜•(gblinear), dart

 

  ๐Ÿ‘‰ n_jobs

   -  XGBoost๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ณ‘๋ ฌ ์Šค๋ ˆ๋“œ ์ˆ˜

 

  ๐Ÿ‘‰ verbosity [๊ธฐ๋ณธ๊ฐ’ = 1]

   -  ์œ ํšจํ•œ ๊ฐ’์€ 0 (๋ฌด์Œ), 1 (๊ฒฝ๊ณ ), 2 (์ •๋ณด), 3 (๋””๋ฒ„๊ทธ)

 

 

๐Ÿ‘€ ๋ถ€์Šคํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ

  - ์„ ํƒํ•œ Booster์— ๋”ฐ๋ผ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์ข…๋ฅ˜๊ฐ€ ๋‹ค๋ฅด๋‹ค

 

  ๐Ÿ‘‰ learning_rate [ ๊ธฐ๋ณธ๊ฐ’ : 0.3 ]

   -  learning rate ๊ฐ€ ๋†’์„์ˆ˜๋ก ๊ณผ์ ํ•ฉ ํ•˜๊ธฐ ์‰ฝ๋‹ค

 

  ๐Ÿ‘‰ n_estimators [ ๊ธฐ๋ณธ๊ฐ’ : 100 ]

   -  ์ƒ์„ฑํ•  weak learner ์˜ ์ˆ˜

   -  learning_rate ๊ฐ€ ๋‚ฎ์„ ๋•, n_estimators ๋ฅผ ๋†’์—ฌ์•ผ ๊ณผ์ ํ•ฉ์ด ๋ฐฉ์ง€

 

  ๐Ÿ‘‰ max_depth [ ๊ธฐ๋ณธ๊ฐ’ : 6 ]

   -  ํŠธ๋ฆฌ์˜ maximum depth

   -  ์ ์ •ํ•œ ๊ฐ’์ด ์ œ์‹œ๋˜์–ด์•ผ ํ•จ (๋ณดํ†ต 3 ~ 10 ์‚ฌ์ด์˜ ๊ฐ’์ด ์ ์šฉ)

   -  max_depth ๊ฐ€ ๋†’์„์ˆ˜๋ก ๋ชจ๋ธ์˜ ๋ณต์žก๋„๊ฐ€ ์ปค์ ธ ๊ณผ์ ํ•ฉ ํ•˜๊ธฐ ์‰ฝ๋‹ค

 

  ๐Ÿ‘‰ min_child_weight [ ๊ธฐ๋ณธ๊ฐ’ : 1 ]

   -  ๊ด€์ธก์น˜์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜ ํ•ฉ์˜ ์ตœ์†Œ

   -  ๊ฐ’์ด ๋†’์„์ˆ˜๋ก ๊ณผ์ ํ•ฉ์ด ๋ฐฉ์ง€

 

  ๐Ÿ‘‰ gamma [ ๊ธฐ๋ณธ๊ฐ’ : 0 ]

   -  leaf node์˜ ์ถ”๊ฐ€ ๋ถ„ํ• ์„ ๊ฒฐ์ •ํ•  ์ตœ์†Œ์†์‹ค ๊ฐ์†Œ๊ฐ’

   -  ํ•ด๋‹น๊ฐ’๋ณด๋‹ค ์†์‹ค์ด ํฌ๊ฒŒ ๊ฐ์†Œํ•  ๋•Œ ๋ถ„๋ฆฌ

   -  ๊ฐ’์ด ๋†’์„์ˆ˜๋ก ๊ณผ์ ํ•ฉ์ด ๋ฐฉ์ง€

 

  ๐Ÿ‘‰ subsample [ ๊ธฐ๋ณธ๊ฐ’ : 1 ]

   -  weak learner ๊ฐ€ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋ง ๋น„์œจ

   -  ๋ณดํ†ต 0.5 ~ 1 ์‚ฌ์šฉ

   -  ๊ฐ’์ด ๋‚ฎ์„์ˆ˜๋ก ๊ณผ์ ํ•ฉ์ด ๋ฐฉ์ง€

 

  ๐Ÿ‘‰ colsample_bytree [ ๊ธฐ๋ณธ๊ฐ’ : 1 ]

   -  ๊ฐ tree ๋ณ„ ์‚ฌ์šฉ๋œ feature percentage

   -  ๋ณดํ†ต 0.5 ~ 1 ์‚ฌ์šฉ

   -  ๊ฐ’์ด ๋‚ฎ์„์ˆ˜๋ก ๊ณผ์ ํ•ฉ์ด ๋ฐฉ์ง€

 

  ๐Ÿ‘‰ lambda [ ๊ธฐ๋ณธ๊ฐ’ : 1, ๋ณ„์นญ : reg_lambda ]

   -  ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ L2 Regularization ์ ์šฉ ๊ฐ’

   -  feature ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์„ ๋•Œ ์ ์šฉ ๊ฒ€ํ† 

   -  ๊ฐ’์ด ํด์ˆ˜๋ก ๊ณผ์ ํ•ฉ ๊ฐ์†Œ

 

  ๐Ÿ‘‰ alpha [ ๊ธฐ๋ณธ๊ฐ’ : 0, ๋ณ„์นญ : reg_alpha ]

   -  ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ L1 Regularization ์ ์šฉ ๊ฐ’

   -  feature ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์„ ๋•Œ ์ ์šฉ ๊ฒ€ํ† 

   -  ๊ฐ’์ด ํด์ˆ˜๋ก ๊ณผ์ ํ•ฉ ๊ฐ์†Œ

 

 

๐Ÿ‘€ ํ•™์Šต ๊ณผ์ • ํŒŒ๋ผ๋ฏธํ„ฐ

  - ํ•™์Šต ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๊ฒฐ์ •

 

  ๐Ÿ‘‰ objective [ ๊ธฐ๋ณธ๊ฐ’ : reg = squarederror ]

   -  reg : squarederror

      โœ”  ์ œ๊ณฑ ์†์‹ค์ด ์žˆ๋Š” ํšŒ๊ท€

 

   -  binary : logistic (binary-logistic classification)

      โœ”  ์ดํ•ญ ๋ถ„๋ฅ˜ ๋ฌธ์ œ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ชจํ˜•์œผ๋กœ ๋ฐ˜ํ™˜๊ฐ’์ด ํด๋ž˜์Šค๊ฐ€ ์•„๋‹ˆ๋ผ ์˜ˆ์ธก ํ™•๋ฅ 

 

   -  multi : softmax

      โœ”  ๋‹คํ•ญ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์˜ ๊ฒฝ์šฐ ์†Œํ”„ํŠธ๋งฅ์Šค(softmax)๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ถ„๋ฅ˜

      โœ”  ๋ฐ˜ํ™˜๋˜๋Š” ๊ฐ’์ด ์˜ˆ์ธกํ™•๋ฅ ์ด ์•„๋‹ˆ๋ผ ํด๋ž˜์Šค, num_class๋„ ์ง€์ •ํ•ด์•ผ ํ•จ

 

 ๐Ÿ‘‰ eval_metric

   -  ๋ชจ๋ธ์˜ ํ‰๊ฐ€ ํ•จ์ˆ˜๋ฅผ ์กฐ์ •ํ•˜๋Š” ํ•จ์ˆ˜

   -  ์„ค์ •ํ•œ objective ๋ณ„๋กœ ๊ธฐ๋ณธ ์„ค์ •๊ฐ’์ด ์ง€์ •๋˜์–ด ์žˆ์Œ

   -  ํ•ด๋‹น ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์— ๋งž๊ฒŒ ํ‰๊ฐ€ ํ•จ์ˆ˜๋ฅผ ์กฐ์ •

      โœ”  rmse : root mean square error

      โœ”  mae : mean absolute error

      โœ”  logloss : negative log-likelihood

      โœ”  error : Binary classification error rate (0.5 threshold)

      โœ”  merror : Multiclass classification error rate

      โœ”  mlogloss : Multiclass logloss

      โœ”  auc : Area under the curve

      โœ”  map : mean average precision

 

 ๐Ÿ‘‰ seed [ ๊ธฐ๋ณธ๊ฐ’ : 0 ]

   -  ์žฌํ˜„ ๊ฐ€๋Šฅํ•˜๋„๋ก ๋‚œ์ˆ˜๋ฅผ ๊ณ ์ •

 

๐Ÿ“ข ๋ฏผ๊ฐํ•˜๊ฒŒ ์กฐ์ •ํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ

  โœ”  Booster ๋ชจ์–‘

  โœ”  eval_metric (ํ‰๊ฐ€ํ•จ์ˆ˜)  /  objective (๋ชฉ์ ํ•จ์ˆ˜)

  โœ”  eta

  โœ”  L1 form (L1 regulariztion form ์ด L2 ๋ณด๋‹ค outlier ์— ๋ฏผ๊ฐ)

  โœ”  L2 form

 

๐Ÿ“ข ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด ์กฐ์ •ํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ

  โœ”  learning rate ๋‚ฎ์ถ”๊ธฐ  →  n_estimators ๋Š” ๋†’์—ฌ์•ผ ํ•จ

  โœ”  max_depth ๋‚ฎ์ถ”๊ธฐ

  โœ”  min_child_weight ๋†’์ด๊ธฐ

  โœ”  gamma ๋†’์ด๊ธฐ

  โœ”  subsample, colsample_bytree ๋‚ฎ์ถ”๊ธฐ

 

โœ‹ Sample code 1. XGBClassifier

import xgboost as xgb
import matplotlib.pyplot as plt

# ๋ชจ๋ธ ์„ ์–ธ
model = xgb.XGBClassifier() 

# ๋ชจ๋ธ ํ›ˆ๋ จ
model.fit(x,y) 

# ๋ชจ๋ธ ์˜ˆ์ธก
y_pred = model.predict(X_test)

โœ‹ Sample code 2. XGBRegressor

import xgboost as xgb

# ๋ชจ๋ธ ์„ ์–ธ
my_model = xgb.XGBRegressor(learning_rate=0.1,max_depth=5,n_estimators=100)

# ๋ชจ๋ธ ํ›ˆ๋ จ
my_model.fit(X_train, y_train, verbose=False)

# ๋ชจ๋ธ ์˜ˆ์ธก
y_pred = my_model.predict(X_test)

โœ‹ XGBoost ๋ชจํ˜• ์‹œ๊ฐํ™”

 

๐Ÿ‘‰ ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด ์‹œ๊ฐํ™” library ์„ค์น˜ (graphviz)

  pip install graphviz
  conda install graphviz

 

๐Ÿ‘‰ xgb.plot_importance() ๋ฉ”์„œ๋“œ

  import xgboost as xgb

  xgb.plot_importance(my_model)

 

๐Ÿ‘‰ xgb.plot_tree() ๋ฉ”์„œ๋“œ

import xgboost as xgb
import matplotlib.pyplot as plt

# num_trees : ๊ทธ๋ฆผ์„ ์—ฌ๋Ÿฌ๊ฐœ ๊ทธ๋ฆด์‹œ ๊ทธ๋ฆผ ๋ฒˆํ˜ธ
# rankdir : ํŠธ๋ฆฌ์˜ ๋ฐฉํ–ฅ, ๋””ํดํŠธ๋Š” ์œ„์•„๋ž˜ ๋ฐฉํ–ฅ
# rankdir="LR" : ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ ๋ฐฉํ–ฅ์œผ๋กœ ํŠธ๋ฆฌ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
xgb.plot_tree(my_model, num_trees=0, rankdir='LR')

fig = plt.gcf()
fig.set_size_inches(150, 100)  # ํ•ด์ƒ๋„ ์ง€์ • ์˜ต์…˜

# ์ด๋ฏธ์ง€ ์ €์žฅํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด
# fig.savefig('tree.png')

plt.show()

๋ฐ˜์‘ํ˜•

'Machine Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Machine Learning] Poisson Regression  (0) 2023.04.26
[Machine learning] scikit-learn pipeline  (0) 2023.04.21
[Machine Learning] Data Leakage  (0) 2023.04.21
[Machine Learning] Hyperparameter Tuning  (0) 2023.04.21
[Machine Learning] Feature Engineering  (0) 2023.04.21