The Ultimate Guide to Scikit-learn Models (with Key Hyperparameters)

Choosing the right model in scikit-learn can be tricky.

This guide gives you model categories, when to use them, real-world examples, and important hyperparameters to tune for better performance.

1. Classification Models — Predicting Categories

Model	When to Use	Example	Key Hyperparameters
LogisticRegression	Binary/multi-class classification with a linear decision boundary.	Will it rain tomorrow? (Yes/No)	`penalty` (`l1`, `l2`, `elasticnet`), `C` (inverse regularization), `solver`
KNeighborsClassifier	Classification by similarity to nearest neighbors.	Classify plants by leaf shape.	`n_neighbors`, `weights` (`uniform`, `distance`), `metric`
DecisionTreeClassifier	Simple, interpretable rules for non-linear data.	Diagnose diabetes.	`max_depth`, `min_samples_split`, `min_samples_leaf`, `criterion`
RandomForestClassifier	Multiple trees for higher accuracy & robustness.	Credit card fraud detection.	`n_estimators`, `max_depth`, `min_samples_split`, `max_features`
GradientBoostingClassifier	Sequential trees that fix previous errors.	Predict customer churn.	`n_estimators`, `learning_rate`, `max_depth`, `subsample`
HistGradientBoostingClassifier	Faster gradient boosting for large data.	Classify product reviews.	`max_iter`, `learning_rate`, `max_depth`, `l2_regularization`
GaussianNB	Naive Bayes for continuous features.	Spam filtering.	`var_smoothing`
BernoulliNB	Naive Bayes for binary features.	Detect keyword presence in text.	`alpha`, `binarize`
MultinomialNB	Naive Bayes for count data.	Text classification.	`alpha`, `fit_prior`
SVC	Works well for small-to-medium datasets with clear margins.	Tumor classification.	`kernel`, `C`, `gamma`
LinearSVC	Fast linear SVM for large-scale classification.	Sentiment analysis.	`C`, `penalty`, `loss`
MLPClassifier	Neural network for non-linear decision boundaries.	Handwriting recognition.	`hidden_layer_sizes`, `activation`, `solver`, `alpha`, `learning_rate`

2. Regression Models — Predicting Continuous Values

Model	When to Use	Example	Key Hyperparameters
LinearRegression	Simple linear relationships.	Predict rainfall from humidity.	(No major hyperparameters)
Ridge	Linear regression with L2 regularization.	Predict house prices.	`alpha`, `solver`
Lasso	Linear regression with L1 regularization (feature selection).	Predict crop yield.	`alpha`, `selection`
ElasticNet	Mix of L1 and L2 penalties.	Predict rainfall with many correlated features.	`alpha`, `l1_ratio`
KNeighborsRegressor	Based on nearest neighbors’ average.	Estimate soil pH.	`n_neighbors`, `weights`, `metric`
DecisionTreeRegressor	Non-linear regression with rules.	Predict wind speed.	`max_depth`, `min_samples_split`, `min_samples_leaf`
RandomForestRegressor	Multiple trees for stable regression.	Predict electricity usage.	`n_estimators`, `max_depth`, `min_samples_split`
GradientBoostingRegressor	Boosted trees for high accuracy.	Predict rental prices.	`n_estimators`, `learning_rate`, `max_depth`, `subsample`
HistGradientBoostingRegressor	Fast gradient boosting on large data.	Predict crop yield.	`max_iter`, `learning_rate`, `max_depth`
SVR	SVM for regression.	Predict river water levels.	`kernel`, `C`, `epsilon`, `gamma`
MLPRegressor	Neural network for regression.	Predict solar power output.	`hidden_layer_sizes`, `activation`, `solver`, `alpha`
HuberRegressor	Robust to outliers.	Predict rainfall in noisy data.	`epsilon`, `alpha`
TheilSenRegressor	Robust, works well with small data.	Estimate median house prices.	`max_subpopulation`, `n_jobs`
RANSACRegressor	Ignores outliers when fitting.	Predict rainfall ignoring faulty readings.	`min_samples`, `residual_threshold`, `max_trials`

3. Clustering Models — Grouping Data Without Labels

Model	When to Use	Example	Key Hyperparameters
KMeans	Data with spherical clusters, known number of clusters.	Group weather stations by climate.	`n_clusters`, `init`, `max_iter`
MiniBatchKMeans	Faster KMeans for large datasets.	Cluster cities by rainfall.	`n_clusters`, `batch_size`, `max_iter`
AgglomerativeClustering	Hierarchical grouping.	Group countries by seasonal temperatures.	`n_clusters`, `linkage`
DBSCAN	Arbitrary-shaped clusters + noise detection.	Detect abnormal rainfall areas.	`eps`, `min_samples`
Birch	Large datasets with streaming data.	Cluster satellite images.	`n_clusters`, `threshold`, `branching_factor`
SpectralClustering	Graph-based clustering.	Group river basins by water flow.	`n_clusters`, `affinity`
GaussianMixture	Probabilistic clustering.	Classify cloud types.	`n_components`, `covariance_type`

4. Anomaly Detection — Spotting Outliers

Model	When to Use	Example	Key Hyperparameters
IsolationForest	High-dimensional anomaly detection.	Detect extreme rainfall.	`n_estimators`, `max_samples`, `contamination`
OneClassSVM	Complex anomaly boundaries.	Find abnormal wind patterns.	`kernel`, `nu`, `gamma`
LocalOutlierFactor	Local density-based outliers.	Detect faulty sensors.	`n_neighbors`, `contamination`, `metric`

5. Time Series (Outside scikit-learn but Common)

Even though scikit-learn doesn’t specialize in time series forecasting, you’ll often use:

Model	When to Use	Example	Key Hyperparameters
ARIMA / SARIMA	Trend & seasonality.	Monthly rainfall forecast.	`p`, `d`, `q`, `P`, `D`, `Q`, `m`
Prophet	Automated trend & seasonality handling.	Predict seasonal flooding.	`changepoint_prior_scale`, `seasonality_mode`
LSTM	Sequence learning.	Predict river levels.	`units`, `dropout`, `epochs`, `batch_size`

✅ How to Choose a Model

Check your target
- Category → Classification
- Number → Regression
- No target → Clustering
- Rare events → Anomaly detection
- Time component → Time series
Start simple, then tune
Begin with a basic model, then adjust key hyperparameters for performance.

If you want, I can now create a visual “Model Selection Flowchart” from this blog so you can see which model to pick step-by-step, along with hyperparameter cheat notes.
That would make it even easier to recall during work.

Do you want me to prepare that flowchart next?

Search This Blog

Data Science