The Ultimate Quick Guide to Prediction Models

- August 09, 2025

(From Linear Lines to Boosted Trees)

When it comes to machine learning, models are like tools in a toolbox — each has its own purpose. You wouldn’t use a hammer to turn a screw, right? The same goes for ML models. Let’s explore the most common ones, when to use them, and which knobs (hyperparameters) you should know about.

1. Linear Regression

Purpose: Predict continuous numbers (e.g., rainfall in mm, house prices).
When to Use: Data has a roughly straight-line relationship.
Key Hyperparameters:

fit_intercept – Whether to calculate the intercept term.
normalize – Whether to normalize input features.

2. Logistic Regression

Purpose: Predict categories (e.g., rain or no rain).
When to Use: Binary or multi-class classification with linear decision boundaries.
Key Hyperparameters:

C – Regularization strength (smaller → stronger regularization).
penalty – Type of regularization (l1, l2).

3. Decision Tree

Purpose: Both classification and regression.
When to Use: Data is not linear and may have complex rules.
Key Hyperparameters:

max_depth – Maximum levels of splits.
min_samples_split – Minimum samples to split a node.

4. Random Forest

Purpose: Ensemble of many decision trees.
When to Use: You want accuracy and robustness with minimal tuning.
Key Hyperparameters:

n_estimators – Number of trees.
max_features – Number of features to consider at each split.

5. Support Vector Machine (SVM)

Purpose: Classification (and regression as SVR).
When to Use: Data is not linearly separable and needs flexible decision boundaries.
Key Hyperparameters:

C – Regularization strength.
kernel – Shape of the boundary (linear, rbf, poly).

6. K-Nearest Neighbors (KNN)

Purpose: Classification or regression.
When to Use: Small to medium datasets where similarity matters.
Key Hyperparameters:

n_neighbors – Number of nearest neighbors.
weights – How neighbors are weighted (uniform or distance).

7. Naive Bayes

Purpose: Classification.
When to Use: Text classification, spam detection.
Key Hyperparameters: (Varies by type: GaussianNB, MultinomialNB)

var_smoothing (Gaussian) – To avoid zero probabilities.
alpha (Multinomial) – Smoothing parameter.

8. Gradient Boosting (GBM)

Purpose: Both regression and classification.
When to Use: When you want strong predictive performance with moderate tuning.
Key Hyperparameters:

n_estimators – Number of boosting rounds.
learning_rate – Step size at each stage.
max_depth – Depth of each tree.

9. XGBoost (Extreme Gradient Boosting)

Purpose: High-performance regression and classification.
When to Use: Large datasets, competitions, or when you need top accuracy.
Key Hyperparameters:

n_estimators – Number of trees.
learning_rate – Shrinks the contribution of each tree.
max_depth – Depth of each tree.
subsample – Percentage of rows to use per tree.
colsample_bytree – Percentage of features to use per tree.

10. Extra Trees (Extremely Randomized Trees)

Purpose: Classification and regression.
When to Use: Like Random Forest but faster and more randomness.
Key Hyperparameters:

n_estimators – Number of trees.
max_depth – Maximum depth.

11. AdaBoost

Purpose: Boosting weak models (often decision trees).
When to Use: Binary classification with simple base models.
Key Hyperparameters:

n_estimators – Number of weak learners.
learning_rate – Contribution of each learner.

12. Lasso & Ridge Regression

Purpose: Regression with regularization.
When to Use: To avoid overfitting in linear regression.
Key Hyperparameters:

alpha – Regularization strength.

Quick Memory Table

Model	Problem Type	Example Use Case	Key Hyperparameters
Linear Regression	Regression	Predict rainfall amount	fit_intercept, normalize
Logistic Regression	Classification	Rain or no rain	C, penalty
Decision Tree	Both	Predict crop yield categories	max_depth, min_samples_split
Random Forest	Both	Predict electricity demand	n_estimators, max_features
SVM	Both	Predict drought risk	C, kernel
KNN	Both	Predict similar weather patterns	n_neighbors, weights
Naive Bayes	Classification	Predict rainfall from text reports	var_smoothing, alpha
Gradient Boosting	Both	Predict yearly rainfall pattern	n_estimators, learning_rate, max_depth
XGBoost	Both	Predict rainfall with top accuracy	n_estimators, learning_rate, max_depth, subsample, colsample_bytree
Extra Trees	Both	Predict soil moisture levels	n_estimators, max_depth
AdaBoost	Classification	Predict cyclone occurrence	n_estimators, learning_rate
Lasso & Ridge	Regression	Predict rainfall with penalty term	alpha

If you want, I can now make this into a beautiful infographic cheat sheet so it’s even easier to memorize. That way, you’ll just glance at it and remember which model to pick.

Search This Blog

Data Science