🌟 Optimizing AdaBoost with DecisionTreeClassifier using GridSearchCV
When building machine learning models, we often want to tune hyperparameters so the model performs at its best. In this blog, we’ll look at how to optimize an AdaBoostClassifier that uses a DecisionTreeClassifier as its base estimator, using GridSearchCV.
📌 The Question
We want to optimize:
-
AdaBoostClassifier parameters:
-
n_estimators→ number of weak learners (boosting rounds) -
learning_rate→ how much each model contributes
-
-
DecisionTreeClassifier parameters:
-
max_depth,min_samples_split, etc.
-
The task asks:
👉 Which set of parameters is the MOST comprehensive in testing both AdaBoost and DecisionTree capabilities?
🔎 Step-by-Step Explanation
1. AdaBoostClassifier
AdaBoost builds multiple weak learners (usually decision trees) one by one, each correcting the errors of the previous.
Key parameters:
-
n_estimators: Number of boosting rounds. -
learning_rate: Controls contribution of each weak learner.
2. DecisionTreeClassifier (Base Estimator)
The base estimator defines the "weak learner".
Key parameters:
-
max_depth: How deep each tree can grow. -
min_samples_split: Minimum samples to split a node. -
min_samples_leaf: Minimum samples at a leaf node.
3. GridSearchCV with AdaBoost
We use GridSearchCV to search across different parameter values.
⚠️ Trick:
When tuning the base estimator inside AdaBoost, we must use the base_estimator__ prefix.
Example:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
# Base estimator
tree = DecisionTreeClassifier()
# AdaBoost classifier
ada = AdaBoostClassifier(base_estimator=tree)
# Parameter grid
param_grid = {
'n_estimators': [50, 100, 150],
'learning_rate': [0.01, 0.1, 1],
'base_estimator__max_depth': [1, 2, 3],
'base_estimator__min_samples_split': [2, 5, 10]
}
# Grid search
grid = GridSearchCV(estimator=ada, param_grid=param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best Parameters:", grid.best_params_)
✅ Correct Answer
The most comprehensive parameter grid is the one that includes both AdaBoost parameters AND DecisionTree parameters:
{
'n_estimators': [50, 100, 150],
'learning_rate': [0.01, 0.1, 1],
'base_estimator__max_depth': [1, 2, 3],
'base_estimator__min_samples_split': [2, 5, 10]
}
🎯 Key Takeaways
-
GridSearchCV helps tune both the ensemble (AdaBoost) and its base learner (DecisionTree).
-
Always use
base_estimator__prefix when tuning parameters of the base model. -
A good parameter grid should test both boosting strength and tree complexity.
💡 In simple words:
We’re not just tuning AdaBoost itself, but also its inner decision trees. The best way is to test parameters of both together.
Comments
Post a Comment