Understanding SVM Predictions with Scikit-Learn’s Wine Dataset

Support Vector Machines (SVMs) are a powerful supervised learning algorithm often used for classification tasks. In this blog, we’ll walk through an example using scikit-learn’s built-in Wine dataset and see how the SVC classifier behaves on a small slice of the data.


The Code

Here’s the Python code snippet we’re analyzing:

from sklearn.datasets import load_wine
from sklearn.svm import SVC

# Load dataset
X, y = load_wine(return_X_y=True)

# Train SVM model
clf = SVC(random_state=0).fit(X, y)

# True labels for samples 40 to 44
print(y[40:45])

# Predictions for those samples
print(clf.predict(X[40:45, :]))

# Accuracy score on this slice
print(clf.score(X[40:45, :], y[40:45]))

Step 1: The Dataset

  • load_wine() loads the Wine dataset (178 samples, 13 features, 3 classes: 0, 1, 2).

  • X → features (chemical analysis of wines).

  • y → target labels (wine classes).

When we print y[40:45], we get:

[0 0 0 0 0]

So, the true classes of samples 40–44 are all class 0.


Step 2: The Model

We train an SVM classifier (SVC) with default settings (kernel='rbf', C=1.0, gamma='scale').

This means it will try to separate the three wine classes using a non-linear decision boundary.


Step 3: Predictions

We now predict on the same slice:

clf.predict(X[40:45, :])

Output:

[2 0 0 2 0]

So the SVM predicts:

  • Sample 40 → predicted as class 2 (wrong).

  • Sample 41 → predicted as class 0 (correct).

  • Sample 42 → predicted as class 0 (correct).

  • Sample 43 → predicted as class 2 (wrong).

  • Sample 44 → predicted as class 0 (correct).


Step 4: Accuracy

We now check the accuracy on this slice:

clf.score(X[40:45, :], y[40:45])

Accuracy is computed as:

Accuracy=Number of Correct PredictionsTotal Predictions\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}

Here:

  • Correct predictions = 3 (indices 41, 42, 44).

  • Total predictions = 5.

Accuracy=35=0.6\text{Accuracy} = \frac{3}{5} = 0.6

Final Output

The three print statements give:

[0 0 0 0 0]
[2 0 0 2 0]
0.6

Key Takeaways

  • SVC with default parameters may misclassify some samples, especially in small subsets.

  • The true labels [0 0 0 0 0] show all wines belong to the same class, but the model sometimes predicts class 2.

  • Accuracy on this slice = 60%.

  • This example highlights why tuning hyperparameters (C, gamma, kernel) is important for SVMs.


✨ In summary, SVM is a robust classifier, but small test slices can reveal misclassifications. Always evaluate performance on the full dataset with cross-validation for more reliable insights.


Would you like me to extend this blog by showing how accuracy improves after hyperparameter tuning (GridSearchCV) for the SVM on the full Wine dataset?

Comments

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply