VotingClassifier Soft or hard

The Code:

from sklearn.ensemble import VotingClassifier
clf1 = LogisticRegression()
clf2 = RandomForestClassifier()
clf3 = SVC(probability=True)

eclf = VotingClassifier(
    estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
    voting='soft'
)

eclf.fit(X_train, y_train)

Step 1: Voting Types


Step 2: Effect of soft voting

  • Each classifier must support predict_proba.

    • Logistic Regression ✅ supports probability.

    • Random Forest ✅ supports probability.

    • SVC ❌ by default doesn’t, but here probability=True makes it compute probabilities (via Platt scaling).

  • The ensemble then averages probabilities:

    Pfinal(class=k)=1ni=1nPi(class=k)P_{\text{final}}(class = k) = \frac{1}{n} \sum_{i=1}^{n} P_i(class = k)
  • The final prediction = class with highest averaged probability.


Correct Effect:
When voting='soft', the VotingClassifier makes predictions based on the average of predicted class probabilities from all base classifiers, not just majority voting.


Would you like me to also compare when soft voting performs better than hard voting (with examples)?

Comments

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply