KMeans clustering with init



Question Recap:

We are using KMeans clustering with:

km = KMeans(n_clusters=5, init='random', n_init=10, random_state=42)
km.fit(X)

Step 1: What does each parameter mean?

  • n_clusters=5 → We want 5 clusters → so 5 centroids must be initialized.

  • init='random' → Centroids are randomly chosen from the dataset.

  • n_init=10 → The whole KMeans process will be run 10 times with different random initializations, and the best clustering (lowest inertia) is kept.

  • random_state=42 → Ensures reproducibility.


Step 2: Interpreting the options

  1. 5 centroids are randomly initialized 10 times

    • Correct → Because we want 5 clusters, and with n_init=10, this process is repeated 10 times.

  2. 10 centroids are randomly initialized 5 times

    • Wrong → We always initialize 5 centroids (because k=5), not 10.

  3. 5 samples in the dataset are selected … at least 10 units away

  4. 10 samples in the dataset … at least 5 units away


✅ Correct Statement:

5 centroids are randomly initialized 10 times


Would you like me to also create a short blog-style explanation on why n_init is important in KMeans (with an example of bad initialization vs good initialization)?

Comments

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply