The Ultimate Quick Guide to Prediction Models


(From Linear Lines to Boosted Trees)

When it comes to machine learning, models are like tools in a toolbox — each has its own purpose. You wouldn’t use a hammer to turn a screw, right? The same goes for ML models. Let’s explore the most common ones, when to use them, and which knobs (hyperparameters) you should know about.


1. Linear Regression

Purpose: Predict continuous numbers (e.g., rainfall in mm, house prices).
When to Use: Data has a roughly straight-line relationship.
Key Hyperparameters:

  • fit_intercept – Whether to calculate the intercept term.

  • normalize – Whether to normalize input features.


2. Logistic Regression

Purpose: Predict categories (e.g., rain or no rain).
When to Use: Binary or multi-class classification with linear decision boundaries.
Key Hyperparameters:

  • C – Regularization strength (smaller → stronger regularization).

  • penalty – Type of regularization (l1, l2).


3. Decision Tree

Purpose: Both classification and regression.
When to Use: Data is not linear and may have complex rules.
Key Hyperparameters:

  • max_depth – Maximum levels of splits.

  • min_samples_split – Minimum samples to split a node.


4. Random Forest

Purpose: Ensemble of many decision trees.
When to Use: You want accuracy and robustness with minimal tuning.
Key Hyperparameters:

  • n_estimators – Number of trees.

  • max_features – Number of features to consider at each split.


5. Support Vector Machine (SVM)

Purpose: Classification (and regression as SVR).
When to Use: Data is not linearly separable and needs flexible decision boundaries.
Key Hyperparameters:

  • C – Regularization strength.

  • kernel – Shape of the boundary (linear, rbf, poly).


6. K-Nearest Neighbors (KNN)

Purpose: Classification or regression.
When to Use: Small to medium datasets where similarity matters.
Key Hyperparameters:

  • n_neighbors – Number of nearest neighbors.

  • weights – How neighbors are weighted (uniform or distance).


7. Naive Bayes

Purpose: Classification.
When to Use: Text classification, spam detection.
Key Hyperparameters: (Varies by type: GaussianNB, MultinomialNB)

  • var_smoothing (Gaussian) – To avoid zero probabilities.

  • alpha (Multinomial) – Smoothing parameter.


8. Gradient Boosting (GBM)

Purpose: Both regression and classification.
When to Use: When you want strong predictive performance with moderate tuning.
Key Hyperparameters:

  • n_estimators – Number of boosting rounds.

  • learning_rate – Step size at each stage.

  • max_depth – Depth of each tree.


9. XGBoost (Extreme Gradient Boosting)

Purpose: High-performance regression and classification.
When to Use: Large datasets, competitions, or when you need top accuracy.
Key Hyperparameters:

  • n_estimators – Number of trees.

  • learning_rate – Shrinks the contribution of each tree.

  • max_depth – Depth of each tree.

  • subsample – Percentage of rows to use per tree.

  • colsample_bytree – Percentage of features to use per tree.


10. Extra Trees (Extremely Randomized Trees)

Purpose: Classification and regression.
When to Use: Like Random Forest but faster and more randomness.
Key Hyperparameters:

  • n_estimators – Number of trees.

  • max_depth – Maximum depth.


11. AdaBoost

Purpose: Boosting weak models (often decision trees).
When to Use: Binary classification with simple base models.
Key Hyperparameters:

  • n_estimators – Number of weak learners.

  • learning_rate – Contribution of each learner.


12. Lasso & Ridge Regression

Purpose: Regression with regularization.
When to Use: To avoid overfitting in linear regression.
Key Hyperparameters:

  • alpha – Regularization strength.


Quick Memory Table

Model Problem Type Example Use Case Key Hyperparameters
Linear Regression Regression Predict rainfall amount fit_intercept, normalize
Logistic Regression Classification Rain or no rain C, penalty
Decision Tree Both Predict crop yield categories max_depth, min_samples_split
Random Forest Both Predict electricity demand n_estimators, max_features
SVM Both Predict drought risk C, kernel
KNN Both Predict similar weather patterns n_neighbors, weights
Naive Bayes Classification Predict rainfall from text reports var_smoothing, alpha
Gradient Boosting Both Predict yearly rainfall pattern n_estimators, learning_rate, max_depth
XGBoost Both Predict rainfall with top accuracy n_estimators, learning_rate, max_depth, subsample, colsample_bytree
Extra Trees Both Predict soil moisture levels n_estimators, max_depth
AdaBoost Classification Predict cyclone occurrence n_estimators, learning_rate
Lasso & Ridge Regression Predict rainfall with penalty term alpha

If you want, I can now make this into a beautiful infographic cheat sheet so it’s even easier to memorize. That way, you’ll just glance at it and remember which model to pick.

Comments

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply