Understanding KNN with a Simple Example

- August 30, 2025

The K-Nearest Neighbors (KNN) algorithm is one of the simplest yet most effective classification methods in machine learning. It classifies a new data point based on the majority label of its K nearest neighbors. Let’s analyze the given code snippet.

The Code

from sklearn.neighbors import KNeighborsClassifier  

X_train = [[1, 50], [2, 60], [3, 70], [4, 80]]  
y_train = [0, 0, 1, 1]  

knn = KNeighborsClassifier(n_neighbors=3)  
knn.fit(X_train, y_train)  

X_test = [[2.5, 64]]  
print(knn.predict(X_test))

Step 1: Training Data

We have 4 training samples:

Feature (x1, x2)	Label
(1, 50)	0
(2, 60)	0
(3, 70)	1
(4, 80)	1

So the dataset is well-balanced:

Class 0 → First two points
Class 1 → Last two points

Step 2: Test Point

We want to classify:

X_{test} = (2.5, 64)

Step 3: Compute Distances

We calculate Euclidean distance:

d = \sqrt{(x_1 - x_{1t})^2 + (x_2 - x_{2t})^2}

Distance to (1, 50):
$\sqrt{(1-2.5)^2 + (50-64)^2} = \sqrt{2.25 + 196} = \sqrt{198.25} \approx 14.09$
Distance to (2, 60):
$\sqrt{(2-2.5)^2 + (60-64)^2} = \sqrt{0.25 + 16} = \sqrt{16.25} \approx 4.03$
Distance to (3, 70):
$\sqrt{(3-2.5)^2 + (70-64)^2} = \sqrt{0.25 + 36} = \sqrt{36.25} \approx 6.02$
Distance to (4, 80):
$\sqrt{(4-2.5)^2 + (80-64)^2} = \sqrt{2.25 + 256} = \sqrt{258.25} \approx 16.08$

Step 4: Nearest 3 Neighbors

The 3 nearest neighbors are:

(2, 60) → Label 0 → Distance ≈ 4.03
(3, 70) → Label 1 → Distance ≈ 6.02
(1, 50) → Label 0 → Distance ≈ 14.09

Step 5: Majority Vote

Among the 3 neighbors:

Class 0 → 2 votes
Class 1 → 1 vote

✅ Majority class = 0

Final Output

The code will print:

[0]

Key Takeaways

KNN assigns labels based on the majority class of the nearest neighbors.
The choice of k (number of neighbors) has a big impact on results.
Here, with k=3, the model predicts class 0 for the test point.

✨ This example shows how KNN works step by step with distances and voting.

Would you like me to extend this blog by showing what happens if we change n_neighbors to 1 or 4 to compare the decision?

Search This Blog

Data Science