🎯 Understanding RandomizedSearchCV in Scikit-Learn
Hyperparameter tuning is crucial in machine learning for improving model performance. In scikit-learn, two main methods are used:
-
GridSearchCV → tries all combinations (exhaustive search)
-
RandomizedSearchCV → samples a fixed number of random combinations
Here, we’ll analyze a code snippet using RandomizedSearchCV with SGDRegressor.
📌 The Code
import numpy as np
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import loguniform
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([3, 6, 9, 12, 15])
sgd_regressor = SGDRegressor()
param_dist = {
'loss': ['squared_loss', 'huber', 'epsilon_insensitive'],
'alpha': loguniform(1e-4, 1e0),
'penalty': ['l1', 'l2', 'elasticnet'],
'epsilon': loguniform(1e-4, 1e-1),
}
random_search = RandomizedSearchCV(
sgd_regressor,
param_distributions=param_dist,
n_iter=10,
cv=3,
scoring='neg_mean_squared_error'
)
random_search.fit(X, y)
🌲 Breaking It Down
1. Number of parameter settings (n_iter) ✅
-
In RandomizedSearchCV,
n_iterspecifies how many random combinations of parameters will be tried. -
Here,
n_iter=10, so 10 random parameter settings will be evaluated.
2. Actual number of combinations ❌
-
Unlike
GridSearchCV, RandomizedSearchCV does not try all possible combinations. -
Even though parameter space has:
-
3 (
loss) × 3 (penalty) × continuous distributions (alpha,epsilon) → huge space
-
-
It only samples 10 random combinations, not 20 or full grid.
3. Search space for alpha ✅
-
alphais sampled from a log-uniform distribution between1e-4and1.0. -
This ensures values are spread multiplicatively (good for hyperparameters spanning orders of magnitude).
4. Scoring metric ✅
-
The scoring is set to:
-
This is the negative of Mean Squared Error (since scikit-learn’s CV scorers follow the convention higher = better).
🚀 Final Correct Statements
✅ The n_iter parameter controls the number of parameter settings to try.
❌ The actual number of combinations tried is 20.
✅ The hyperparameter search space for alpha follows a log-uniform distribution.
✅ The scoring metric is the negative mean squared error.
💡 Takeaway
-
Use
GridSearchCVfor small, well-defined grids. -
Use
RandomizedSearchCVfor large search spaces — it samples efficiently. -
Always pick an appropriate scoring metric depending on regression/classification goals.
👉 Would you like me to now compile all four topics (Decision Trees, Dimensionality Reduction, Partial Fit, RandomizedSearchCV) into a single structured blog guide titled something like “Top Machine Learning Interview Questions Explained”?
Comments
Post a Comment