⚖️ Logistic Regression in Sklearn: Handling Class Imbalance and Regularization

- August 30, 2025

Introduction

Logistic Regression is one of the most widely used algorithms for binary classification. While it is simple, powerful, and interpretable, two important aspects play a huge role in its performance:

Class imbalance – when one class has far more samples than the other.
Regularization – controlling model complexity to avoid overfitting.

In this blog, we’ll understand how class_weight='balanced' and the parameter C work in LogisticRegression from scikit-learn.

🔹 The Example Code

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(class_weight='balanced', C=0.5)
model.fit(X, y)

This code trains a Logistic Regression model with:

class_weight='balanced'
C=0.5

Now let’s break down what this means.

🔹 Class Weight and Imbalanced Datasets

When classes are imbalanced (e.g., 90% negatives, 10% positives), the model might be biased towards the majority class.

👉 Setting class_weight='balanced':

Automatically adjusts weights inversely proportional to class frequencies.
Rare classes get higher weight so the model pays more attention to them.

✅ Correct understanding:

“The balanced mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data.”

❌ Wrong assumption:

"Equal weights are given to all classes." → This is incorrect, because weights are proportional to imbalance.

🔹 Regularization with Parameter `C`

Logistic Regression in sklearn always applies regularization unless penalty='none' is explicitly set.

The parameter C is the inverse of regularization strength.
Lower C → Stronger regularization.
Higher C → Weaker regularization.

In the given code:

C = 0.5 means the model applies moderate regularization.

✅ Correct understanding:

“The value of C indicates that the model will apply a regularization.”

❌ Wrong assumption:

“No regularization is applied because C is set.” → Incorrect.

🔹 Final Takeaways

class_weight='balanced' handles imbalanced datasets by adjusting weights inversely to class frequencies.
The C parameter controls regularization strength (smaller C = stronger regularization).
Logistic Regression in sklearn by default always applies regularization unless explicitly disabled.

👉 Pro Tip: Always check class balance in your dataset. If classes are highly skewed, use class_weight='balanced' or manually specify class weights to prevent bias.

Would you like me to also create a comparison table:
📊 “Logistic Regression with Default vs Balanced Class Weights” for better visualization in the blog?

Search This Blog

Data Science