How KNeighborsClassifier works
Good question 👍 Let’s break it down.
How KNeighborsClassifier works
-
Training phase: Just stores the dataset. (No real “training” → very cheap computationally.)
-
Prediction phase: For each new query point:
-
Compute distances to all training samples.
-
Sort or partially select the k nearest neighbors.
-
Aggregate votes (or probabilities).
-
Where’s the bottleneck?
-
Training? ❌ Cheap (just memorizing data).
-
Choosing best k? ❌ That’s a hyperparameter tuning task, not the core bottleneck.
-
Calculating class probabilities? ❌ Very light compared to distance computations.
-
Storing and searching through all training samples at prediction time? ✅ YES.
That’s the heavy part: computing distances from a test point to all training samples (especially in high dimensions / large datasets).
✅ Correct Answer
Storing and searching through all training samples at prediction time.
⚡ Pro Tip: To speed up KNN, people often use KD-Trees, Ball Trees, or Approximate Nearest Neighbors (ANN) instead of brute-force search.
Would you like me to show a quick time complexity comparison of KNN training vs prediction?
Comments
Post a Comment