🤖 K-Means Clustering Explained with a Customer Purchase Example

Clustering is one of the most popular unsupervised learning techniques in machine learning. It groups similar data points together without predefined labels. A widely used clustering algorithm is K-Means, which works by minimizing the distance between data points and their assigned cluster centers.

In this blog, we’ll break down a simple Python example that uses K-Means clustering to group customers based on their purchase behavior.


🛒 The Dataset

We are working with customer data consisting of two attributes:

Here’s a small dataset:

data = np.array([
    [150, 6],
    [300, 12],
    [50, 2],
    [250, 8],
    [80, 3]
])

Each row represents a customer’s purchase profile.


⚙️ K-Means Implementation in Python

from sklearn.cluster import KMeans
import numpy as np

# Customer purchase data
data = np.array([[150, 6], [300, 12], [50, 2], [250, 8], [80, 3]])

# Initialize KMeans with 3 clusters
kmeans = KMeans(n_clusters=3)
kmeans.fit(data)

# Predict cluster labels
labels = kmeans.labels_

# Get cluster centroids
centroids = kmeans.cluster_centers_

🔍 Step-by-Step Breakdown

  1. Initialization

    kmeans = KMeans(n_clusters=3)
    

    We ask K-Means to group customers into 3 clusters.

  2. Fitting the Model

    kmeans.fit(data)
    

    The algorithm assigns each data point to a cluster and iteratively updates the centroids (mean position of each cluster).

  3. Cluster Labels

    labels = kmeans.labels_
    

    This gives us an array of integers (0, 1, or 2) indicating which cluster each customer belongs to.

    Example Output:

    labels = [1, 2, 0, 2, 0]
    
    • Customer [150, 6] → Cluster 1

    • Customer [300, 12] → Cluster 2

    • Customer [50, 2] → Cluster 0

    • …and so on.

  4. Cluster Centroids

    centroids = kmeans.cluster_centers_
    

    This gives the coordinates of the cluster centers, which represent the “average customer” in each segment.

    Example Output:

    [[ 65,  2.5 ],
     [150,  6.0 ],
     [275, 10.0 ]]
    

📊 Interpretation

The variable labels represents the cluster assignment of each customer:

  • Customers in the same cluster have similar spending and purchase patterns.

  • Businesses can use these insights for:

    • Personalized marketing

    • Loyalty programs

    • Targeted discounts

For instance:


🚀 Key Takeaways

  1. K-Means is an unsupervised algorithm that groups data into k clusters.

  2. labels indicate which cluster each data point belongs to.

  3. centroids represent the average position of each cluster.

  4. Useful in customer segmentation, market research, image compression, and more.


👉 This simple example shows how a business can segment customers based on purchasing behavior to make data-driven decisions.


Would you like me to also create a visual scatter plot with cluster colors and centroids (Python + Matplotlib code) for this dataset? That would make the blog much more engaging.

Comments

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply