Understanding SVM Predictions with Scikit-Learn’s Wine Dataset

- August 30, 2025

Support Vector Machines (SVMs) are a powerful supervised learning algorithm often used for classification tasks. In this blog, we’ll walk through an example using scikit-learn’s built-in Wine dataset and see how the SVC classifier behaves on a small slice of the data.

The Code

Here’s the Python code snippet we’re analyzing:

from sklearn.datasets import load_wine
from sklearn.svm import SVC

# Load dataset
X, y = load_wine(return_X_y=True)

# Train SVM model
clf = SVC(random_state=0).fit(X, y)

# True labels for samples 40 to 44
print(y[40:45])

# Predictions for those samples
print(clf.predict(X[40:45, :]))

# Accuracy score on this slice
print(clf.score(X[40:45, :], y[40:45]))

Step 1: The Dataset

load_wine() loads the Wine dataset (178 samples, 13 features, 3 classes: 0, 1, 2).
X → features (chemical analysis of wines).
y → target labels (wine classes).

When we print y[40:45], we get:

[0 0 0 0 0]

So, the true classes of samples 40–44 are all class 0.

Step 2: The Model

We train an SVM classifier (SVC) with default settings (kernel='rbf', C=1.0, gamma='scale').

This means it will try to separate the three wine classes using a non-linear decision boundary.

Step 3: Predictions

We now predict on the same slice:

clf.predict(X[40:45, :])

Output:

[2 0 0 2 0]

So the SVM predicts:

Sample 40 → predicted as class 2 (wrong).
Sample 41 → predicted as class 0 (correct).
Sample 42 → predicted as class 0 (correct).
Sample 43 → predicted as class 2 (wrong).
Sample 44 → predicted as class 0 (correct).

Step 4: Accuracy

We now check the accuracy on this slice:

clf.score(X[40:45, :], y[40:45])

Accuracy is computed as:

\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}

Here:

Correct predictions = 3 (indices 41, 42, 44).
Total predictions = 5.

\text{Accuracy} = \frac{3}{5} = 0.6

Final Output

The three print statements give:

[0 0 0 0 0]
[2 0 0 2 0]
0.6

Key Takeaways

SVC with default parameters may misclassify some samples, especially in small subsets.
The true labels [0 0 0 0 0] show all wines belong to the same class, but the model sometimes predicts class 2.
Accuracy on this slice = 60%.
This example highlights why tuning hyperparameters (C, gamma, kernel) is important for SVMs.

✨ In summary, SVM is a robust classifier, but small test slices can reveal misclassifications. Always evaluate performance on the full dataset with cross-validation for more reliable insights.

Would you like me to extend this blog by showing how accuracy improves after hyperparameter tuning (GridSearchCV) for the SVM on the full Wine dataset?

Search This Blog

Data Science