📊 Choosing the Right Scoring Metric in Cross-Validation
When training a machine learning model, evaluating it properly is just as important as building it. In Python’s scikit-learn, the function cross_val_score() is widely used for cross-validation. But a common confusion arises: Which scoring metric should we use?
🔑 Key Idea: Metric Depends on Problem Type
-
Classification Problems → Metrics like
accuracy,precision,recall,f1,roc_auc. -
Regression Problems → Metrics like
r2,neg_mean_absolute_error,neg_mean_squared_error,explained_variance.
If you use a regression metric for a classification model, it will either fail or give meaningless results.
✅ Correct Metric for Classification
In the screenshot, the question asks:
Which of the following scoring metrics could be used for evaluating a classification model in cross-validation using
cross_val_score()?
Correct Answer → roc_auc
Why?
-
roc_auc(Receiver Operating Characteristic - Area Under Curve) measures how well the model separates classes. -
Other options like
r2,mean_absolute_error, andexplained_varianceare regression metrics — not suitable for classification.
🐍 Python Example
Let’s see it in action with scikit-learn:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
# Load a classification dataset
X, y = load_breast_cancer(return_X_y=True)
# Define model
model = LogisticRegression(max_iter=500)
# Evaluate with cross-validation using ROC-AUC
scores = cross_val_score(model, X, y, cv=5, scoring="roc_auc")
print("ROC-AUC Scores:", scores)
print("Average ROC-AUC:", scores.mean())
Output (example):
ROC-AUC Scores: [0.98 0.99 0.97 0.98 0.99]
Average ROC-AUC: 0.982
📌 Takeaway
-
Always match the scoring metric to your problem type.
-
For classification, use
accuracy,f1,precision,recall, orroc_auc. -
For regression, use
r2,neg_mean_squared_error,neg_mean_absolute_error, etc. -
Using the wrong metric (like
r2for classification) will lead to incorrect evaluation.
🚀 Next time you use cross_val_score(), remember: ROC-AUC is perfect for binary classification evaluation.
Do you want me to extend this blog into a comparison table of metrics (classification vs regression) so it’s easier for beginners to quickly pick the right one?
Comments
Post a Comment