VotingClassifier Soft or hard
The Code:
from sklearn.ensemble import VotingClassifier
clf1 = LogisticRegression()
clf2 = RandomForestClassifier()
clf3 = SVC(probability=True)
eclf = VotingClassifier(
estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
voting='soft'
)
eclf.fit(X_train, y_train)
Step 1: Voting Types
-
voting='hard'→ Uses majority rule voting based on predicted class labels. -
voting='soft'→ Uses predicted class probabilities from each classifier, averages them, and selects the class with the highest average probability.
Step 2: Effect of soft voting
-
Each classifier must support
predict_proba.-
Logistic Regression ✅ supports probability.
-
Random Forest ✅ supports probability.
-
SVC ❌ by default doesn’t, but here
probability=Truemakes it compute probabilities (via Platt scaling).
-
-
The ensemble then averages probabilities:
-
The final prediction = class with highest averaged probability.
✅ Correct Effect:
When voting='soft', the VotingClassifier makes predictions based on the average of predicted class probabilities from all base classifiers, not just majority voting.
Would you like me to also compare when soft voting performs better than hard voting (with examples)?
Comments
Post a Comment