⚖️ Logistic Regression in Sklearn: Handling Class Imbalance and Regularization

 Introduction

Logistic Regression is one of the most widely used algorithms for binary classification. While it is simple, powerful, and interpretable, two important aspects play a huge role in its performance:

  1. Class imbalance – when one class has far more samples than the other.

  2. Regularization – controlling model complexity to avoid overfitting.

In this blog, we’ll understand how class_weight='balanced' and the parameter C work in LogisticRegression from scikit-learn.


🔹 The Example Code

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(class_weight='balanced', C=0.5)
model.fit(X, y)

This code trains a Logistic Regression model with:

  • class_weight='balanced'

  • C=0.5

Now let’s break down what this means.


🔹 Class Weight and Imbalanced Datasets

When classes are imbalanced (e.g., 90% negatives, 10% positives), the model might be biased towards the majority class.

👉 Setting class_weight='balanced':

  • Automatically adjusts weights inversely proportional to class frequencies.

  • Rare classes get higher weight so the model pays more attention to them.

✅ Correct understanding:

“The balanced mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data.”

❌ Wrong assumption:

"Equal weights are given to all classes." → This is incorrect, because weights are proportional to imbalance.


🔹 Regularization with Parameter C

Logistic Regression in sklearn always applies regularization unless penalty='none' is explicitly set.

In the given code:

✅ Correct understanding:

“The value of C indicates that the model will apply a regularization.”

❌ Wrong assumption:

“No regularization is applied because C is set.” → Incorrect.


🔹 Final Takeaways

  • class_weight='balanced' handles imbalanced datasets by adjusting weights inversely to class frequencies.

  • The C parameter controls regularization strength (smaller C = stronger regularization).

  • Logistic Regression in sklearn by default always applies regularization unless explicitly disabled.


👉 Pro Tip: Always check class balance in your dataset. If classes are highly skewed, use class_weight='balanced' or manually specify class weights to prevent bias.


Would you like me to also create a comparison table:
📊 “Logistic Regression with Default vs Balanced Class Weights” for better visualization in the blog?

Comments

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply