KMeans clustering with init
Question Recap:
We are using KMeans clustering with:
km = KMeans(n_clusters=5, init='random', n_init=10, random_state=42)
km.fit(X)
Step 1: What does each parameter mean?
-
n_clusters=5→ We want 5 clusters → so 5 centroids must be initialized. -
init='random'→ Centroids are randomly chosen from the dataset. -
n_init=10→ The whole KMeans process will be run 10 times with different random initializations, and the best clustering (lowest inertia) is kept. -
random_state=42→ Ensures reproducibility.
Step 2: Interpreting the options
-
✅ 5 centroids are randomly initialized 10 times
-
Correct → Because we want 5 clusters, and with
n_init=10, this process is repeated 10 times.
-
-
❌ 10 centroids are randomly initialized 5 times
-
Wrong → We always initialize 5 centroids (because k=5), not 10.
-
-
❌ 5 samples in the dataset are selected … at least 10 units away
-
Wrong → That would describe k-means++ initialization, not
init='random'.
-
-
❌ 10 samples in the dataset … at least 5 units away
-
Same reason → That describes a distance-based initialization, not random.
-
✅ Correct Statement:
5 centroids are randomly initialized 10 times
Would you like me to also create a short blog-style explanation on why n_init is important in KMeans (with an example of bad initialization vs good initialization)?
Comments
Post a Comment