The Ultimate Quick Guide to Prediction Models
(From Linear Lines to Boosted Trees)
When it comes to machine learning, models are like tools in a toolbox — each has its own purpose. You wouldn’t use a hammer to turn a screw, right? The same goes for ML models. Let’s explore the most common ones, when to use them, and which knobs (hyperparameters) you should know about.
1. Linear Regression
Purpose: Predict continuous numbers (e.g., rainfall in mm, house prices).
When to Use: Data has a roughly straight-line relationship.
Key Hyperparameters:
-
fit_intercept– Whether to calculate the intercept term. -
normalize– Whether to normalize input features.
2. Logistic Regression
Purpose: Predict categories (e.g., rain or no rain).
When to Use: Binary or multi-class classification with linear decision boundaries.
Key Hyperparameters:
-
C– Regularization strength (smaller → stronger regularization). -
penalty– Type of regularization (l1,l2).
3. Decision Tree
Purpose: Both classification and regression.
When to Use: Data is not linear and may have complex rules.
Key Hyperparameters:
-
max_depth– Maximum levels of splits. -
min_samples_split– Minimum samples to split a node.
4. Random Forest
Purpose: Ensemble of many decision trees.
When to Use: You want accuracy and robustness with minimal tuning.
Key Hyperparameters:
-
n_estimators– Number of trees. -
max_features– Number of features to consider at each split.
5. Support Vector Machine (SVM)
Purpose: Classification (and regression as SVR).
When to Use: Data is not linearly separable and needs flexible decision boundaries.
Key Hyperparameters:
-
C– Regularization strength. -
kernel– Shape of the boundary (linear,rbf,poly).
6. K-Nearest Neighbors (KNN)
Purpose: Classification or regression.
When to Use: Small to medium datasets where similarity matters.
Key Hyperparameters:
-
n_neighbors– Number of nearest neighbors. -
weights– How neighbors are weighted (uniformordistance).
7. Naive Bayes
Purpose: Classification.
When to Use: Text classification, spam detection.
Key Hyperparameters: (Varies by type: GaussianNB, MultinomialNB)
-
var_smoothing(Gaussian) – To avoid zero probabilities. -
alpha(Multinomial) – Smoothing parameter.
8. Gradient Boosting (GBM)
Purpose: Both regression and classification.
When to Use: When you want strong predictive performance with moderate tuning.
Key Hyperparameters:
-
n_estimators– Number of boosting rounds. -
learning_rate– Step size at each stage. -
max_depth– Depth of each tree.
9. XGBoost (Extreme Gradient Boosting)
Purpose: High-performance regression and classification.
When to Use: Large datasets, competitions, or when you need top accuracy.
Key Hyperparameters:
-
n_estimators– Number of trees. -
learning_rate– Shrinks the contribution of each tree. -
max_depth– Depth of each tree. -
subsample– Percentage of rows to use per tree. -
colsample_bytree– Percentage of features to use per tree.
10. Extra Trees (Extremely Randomized Trees)
Purpose: Classification and regression.
When to Use: Like Random Forest but faster and more randomness.
Key Hyperparameters:
-
n_estimators– Number of trees. -
max_depth– Maximum depth.
11. AdaBoost
Purpose: Boosting weak models (often decision trees).
When to Use: Binary classification with simple base models.
Key Hyperparameters:
-
n_estimators– Number of weak learners. -
learning_rate– Contribution of each learner.
12. Lasso & Ridge Regression
Purpose: Regression with regularization.
When to Use: To avoid overfitting in linear regression.
Key Hyperparameters:
-
alpha– Regularization strength.
Quick Memory Table
| Model | Problem Type | Example Use Case | Key Hyperparameters |
|---|---|---|---|
| Linear Regression | Regression | Predict rainfall amount | fit_intercept, normalize |
| Logistic Regression | Classification | Rain or no rain | C, penalty |
| Decision Tree | Both | Predict crop yield categories | max_depth, min_samples_split |
| Random Forest | Both | Predict electricity demand | n_estimators, max_features |
| SVM | Both | Predict drought risk | C, kernel |
| KNN | Both | Predict similar weather patterns | n_neighbors, weights |
| Naive Bayes | Classification | Predict rainfall from text reports | var_smoothing, alpha |
| Gradient Boosting | Both | Predict yearly rainfall pattern | n_estimators, learning_rate, max_depth |
| XGBoost | Both | Predict rainfall with top accuracy | n_estimators, learning_rate, max_depth, subsample, colsample_bytree |
| Extra Trees | Both | Predict soil moisture levels | n_estimators, max_depth |
| AdaBoost | Classification | Predict cyclone occurrence | n_estimators, learning_rate |
| Lasso & Ridge | Regression | Predict rainfall with penalty term | alpha |
If you want, I can now make this into a beautiful infographic cheat sheet so it’s even easier to memorize. That way, you’ll just glance at it and remember which model to pick.
Comments
Post a Comment