Understanding Overfitting and Underfitting in Machine Learning (With Examples and How to Avoid Them)

- August 09, 2025

When you build a machine learning model, two common problems you might face are overfitting and underfitting. Both can prevent your model from making good predictions on new data. Let’s dive into what these problems are, see simple examples, and learn how to avoid them.

What is Overfitting?

Overfitting happens when a model learns the training data too well, including the noise and random fluctuations. This means it performs extremely well on the training data but poorly on new, unseen data (test data or real-world data).

Example:
Imagine you’re trying to predict the price of houses based on size. If your model tries to memorize every small detail (including weird outliers) in the training data, it might create a very complex curve that fits all points exactly. This curve might look perfect on training data but will fail on new houses because it has learned noise, not just the real pattern.

Visual idea:

Training data: scattered points
Overfitted model: a very wiggly curve passing through every point

What is Underfitting?

Underfitting happens when a model is too simple to capture the underlying pattern in the data. It performs poorly both on the training data and new data.

Example:
If you use a simple straight line to predict house prices but the actual relationship is more complex (say, quadratic or exponential), your model will miss the real trend. It won’t fit the training data well, and it will also fail to predict new data accurately.

Visual idea:

Training data: scattered points
Underfitted model: a flat or almost flat line missing the trend

How to Detect Overfitting and Underfitting?

You can check model performance on both training data and validation/test data:

Scenario	Training Error	Test Error	Interpretation
Overfitting	Low	High	Model too complex
Underfitting	High	High	Model too simple
Good Fit (Just Right)	Low	Low	Model generalizes well

How to Avoid Overfitting?

Use More Training Data
More data can help the model learn the true patterns instead of noise.
Simplify the Model
Choose simpler algorithms or reduce model complexity (e.g., reduce polynomial degree).
Regularization
Add a penalty for complexity using techniques like L1 (Lasso) or L2 (Ridge) regularization.
Cross-Validation
Use k-fold cross-validation to make sure your model generalizes well.
Early Stopping
When training models like neural networks, stop training when validation error starts increasing.
Pruning (for Decision Trees)
Remove branches that have little importance to avoid overly complex trees.

How to Avoid Underfitting?

Use More Complex Models
Try models that can capture more complexity like decision trees, random forests, or neural networks.
Feature Engineering
Add new features or transformations of features that better represent the underlying data.
Decrease Regularization
Sometimes too strong regularization makes the model too simple.
Train Longer or Tune Hyperparameters
Train for more epochs or tune hyperparameters for better performance.

Summary Table

Problem	Cause	Symptoms	Solution
Overfitting	Model too complex, memorizes noise	Low train error, high test error	Regularization, simpler model, more data
Underfitting	Model too simple, misses patterns	High train error, high test error	More complex model, feature engineering

Final Thoughts

Balancing between overfitting and underfitting is key to building effective machine learning models. Always validate your model on unseen data and adjust complexity accordingly. Use techniques like cross-validation, regularization, and feature engineering to help your model generalize well.

If you want, I can also help with example code or specific model tuning tips. Would you like that?

Search This Blog

Data Science