The Ultimate Quick Guide to Prediction Models


(From Linear Lines to Boosted Trees)

When it comes to machine learning, models are like tools in a toolbox — each has its own purpose. You wouldn’t use a hammer to turn a screw, right? The same goes for ML models. Let’s explore the most common ones, when to use them, and which knobs (hyperparameters) you should know about.


1. Linear Regression

Purpose: Predict continuous numbers (e.g., rainfall in mm, house prices).
When to Use: Data has a roughly straight-line relationship.
Key Hyperparameters:

  • fit_intercept – Whether to calculate the intercept term.

  • normalize – Whether to normalize input features.


2. Logistic Regression

Purpose: Predict categories (e.g., rain or no rain).
When to Use: Binary or multi-class classification with linear decision boundaries.
Key Hyperparameters:

  • C – Regularization strength (smaller → stronger regularization).

  • penalty – Type of regularization (l1, l2).


3. Decision Tree

Purpose: Both classification and regression.
When to Use: Data is not linear and may have complex rules.
Key Hyperparameters:

  • max_depth – Maximum levels of splits.

  • min_samples_split – Minimum samples to split a node.


4. Random Forest

Purpose: Ensemble of many decision trees.
When to Use: You want accuracy and robustness with minimal tuning.
Key Hyperparameters:

  • n_estimators – Number of trees.

  • max_features – Number of features to consider at each split.


5. Support Vector Machine (SVM)

Purpose: Classification (and regression as SVR).
When to Use: Data is not linearly separable and needs flexible decision boundaries.
Key Hyperparameters:

  • C – Regularization strength.

  • kernel – Shape of the boundary (linear, rbf, poly).


6. K-Nearest Neighbors (KNN)

Purpose: Classification or regression.
When to Use: Small to medium datasets where similarity matters.
Key Hyperparameters:

  • n_neighbors – Number of nearest neighbors.

  • weights – How neighbors are weighted (uniform or distance).


7. Naive Bayes

Purpose: Classification.
When to Use: Text classification, spam detection.
Key Hyperparameters: (Varies by type: GaussianNB, MultinomialNB)

  • var_smoothing (Gaussian) – To avoid zero probabilities.

  • alpha (Multinomial) – Smoothing parameter.


8. Gradient Boosting (GBM)

Purpose: Both regression and classification.
When to Use: When you want strong predictive performance with moderate tuning.
Key Hyperparameters:

  • n_estimators – Number of boosting rounds.

  • learning_rate – Step size at each stage.

  • max_depth – Depth of each tree.


9. XGBoost (Extreme Gradient Boosting)

Purpose: High-performance regression and classification.
When to Use: Large datasets, competitions, or when you need top accuracy.
Key Hyperparameters:

  • n_estimators – Number of trees.

  • learning_rate – Shrinks the contribution of each tree.

  • max_depth – Depth of each tree.

  • subsample – Percentage of rows to use per tree.

  • colsample_bytree – Percentage of features to use per tree.


10. Extra Trees (Extremely Randomized Trees)

Purpose: Classification and regression.
When to Use: Like Random Forest but faster and more randomness.
Key Hyperparameters:

  • n_estimators – Number of trees.

  • max_depth – Maximum depth.


11. AdaBoost

Purpose: Boosting weak models (often decision trees).
When to Use: Binary classification with simple base models.
Key Hyperparameters:

  • n_estimators – Number of weak learners.

  • learning_rate – Contribution of each learner.


12. Lasso & Ridge Regression

Purpose: Regression with regularization.
When to Use: To avoid overfitting in linear regression.
Key Hyperparameters:

  • alpha – Regularization strength.


Quick Memory Table

Model Problem Type Example Use Case Key Hyperparameters
Linear Regression Regression Predict rainfall amount fit_intercept, normalize
Logistic Regression Classification Rain or no rain C, penalty
Decision Tree Both Predict crop yield categories max_depth, min_samples_split
Random Forest Both Predict electricity demand n_estimators, max_features
SVM Both Predict drought risk C, kernel
KNN Both Predict similar weather patterns n_neighbors, weights
Naive Bayes Classification Predict rainfall from text reports var_smoothing, alpha
Gradient Boosting Both Predict yearly rainfall pattern n_estimators, learning_rate, max_depth
XGBoost Both Predict rainfall with top accuracy n_estimators, learning_rate, max_depth, subsample, colsample_bytree
Extra Trees Both Predict soil moisture levels n_estimators, max_depth
AdaBoost Classification Predict cyclone occurrence n_estimators, learning_rate
Lasso & Ridge Regression Predict rainfall with penalty term alpha

If you want, I can now make this into a beautiful infographic cheat sheet so it’s even easier to memorize. That way, you’ll just glance at it and remember which model to pick.

Comments

Popular posts from this blog

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

Linear Regression with and without Intercept: Explained Simply