🤖 K-Means Clustering Explained with a Customer Purchase Example

- August 30, 2025

Clustering is one of the most popular unsupervised learning techniques in machine learning. It groups similar data points together without predefined labels. A widely used clustering algorithm is K-Means, which works by minimizing the distance between data points and their assigned cluster centers.

In this blog, we’ll break down a simple Python example that uses K-Means clustering to group customers based on their purchase behavior.

🛒 The Dataset

We are working with customer data consisting of two attributes:

Total Amount Spent (in currency units)
Number of Items Purchased

Here’s a small dataset:

data = np.array([
    [150, 6],
    [300, 12],
    [50, 2],
    [250, 8],
    [80, 3]
])

Each row represents a customer’s purchase profile.

⚙️ K-Means Implementation in Python

from sklearn.cluster import KMeans
import numpy as np

# Customer purchase data
data = np.array([[150, 6], [300, 12], [50, 2], [250, 8], [80, 3]])

# Initialize KMeans with 3 clusters
kmeans = KMeans(n_clusters=3)
kmeans.fit(data)

# Predict cluster labels
labels = kmeans.labels_

# Get cluster centroids
centroids = kmeans.cluster_centers_

🔍 Step-by-Step Breakdown

Initialization
```
kmeans = KMeans(n_clusters=3)
```
We ask K-Means to group customers into 3 clusters.
Fitting the Model
```
kmeans.fit(data)
```
The algorithm assigns each data point to a cluster and iteratively updates the centroids (mean position of each cluster).
Cluster Labels
```
labels = kmeans.labels_
```
This gives us an array of integers (0, 1, or 2) indicating which cluster each customer belongs to.

Example Output:
```
labels = [1, 2, 0, 2, 0]
```
- Customer [150, 6] → Cluster 1
- Customer [300, 12] → Cluster 2
- Customer [50, 2] → Cluster 0
- …and so on.
Cluster Centroids
```
centroids = kmeans.cluster_centers_
```
This gives the coordinates of the cluster centers, which represent the “average customer” in each segment.

Example Output:
```
[[ 65,  2.5 ],
 [150,  6.0 ],
 [275, 10.0 ]]
```

📊 Interpretation

The variable labels represents the cluster assignment of each customer:

Customers in the same cluster have similar spending and purchase patterns.
Businesses can use these insights for:
- Personalized marketing
- Loyalty programs
- Targeted discounts

For instance:

Cluster 0 → Low spenders (budget customers)
Cluster 1 → Medium spenders
Cluster 2 → High spenders / premium customers

🚀 Key Takeaways

K-Means is an unsupervised algorithm that groups data into k clusters.
labels indicate which cluster each data point belongs to.
centroids represent the average position of each cluster.
Useful in customer segmentation, market research, image compression, and more.

👉 This simple example shows how a business can segment customers based on purchasing behavior to make data-driven decisions.

Would you like me to also create a visual scatter plot with cluster colors and centroids (Python + Matplotlib code) for this dataset? That would make the blog much more engaging.

Search This Blog

Data Science