KMeans clustering with init

- August 30, 2025

Question Recap:

We are using KMeans clustering with:

km = KMeans(n_clusters=5, init='random', n_init=10, random_state=42)
km.fit(X)

Step 1: What does each parameter mean?

n_clusters=5 → We want 5 clusters → so 5 centroids must be initialized.
init='random' → Centroids are randomly chosen from the dataset.
n_init=10 → The whole KMeans process will be run 10 times with different random initializations, and the best clustering (lowest inertia) is kept.
random_state=42 → Ensures reproducibility.

Step 2: Interpreting the options

✅ 5 centroids are randomly initialized 10 times
- Correct → Because we want 5 clusters, and with n_init=10, this process is repeated 10 times.
❌ 10 centroids are randomly initialized 5 times
- Wrong → We always initialize 5 centroids (because k=5), not 10.
❌ 5 samples in the dataset are selected … at least 10 units away
- Wrong → That would describe k-means++ initialization, not init='random'.
❌ 10 samples in the dataset … at least 5 units away
- Same reason → That describes a distance-based initialization, not random.

✅ Correct Statement:

5 centroids are randomly initialized 10 times

Would you like me to also create a short blog-style explanation on why n_init is important in KMeans (with an example of bad initialization vs good initialization)?

Search This Blog

Data Science

KMeans clustering with init

Question Recap:

Step 1: What does each parameter mean?

Step 2: Interpreting the options

✅ Correct Statement:

Comments

Post a Comment

Popular posts from this blog

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

⚖️ Logistic Regression in Sklearn: Handling Class Imbalance and Regularization

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention