🤖 K-Means Clustering Explained with a Customer Purchase Example
Clustering is one of the most popular unsupervised learning techniques in machine learning. It groups similar data points together without predefined labels. A widely used clustering algorithm is K-Means, which works by minimizing the distance between data points and their assigned cluster centers.
In this blog, we’ll break down a simple Python example that uses K-Means clustering to group customers based on their purchase behavior.
🛒 The Dataset
We are working with customer data consisting of two attributes:
-
Total Amount Spent (in currency units)
Here’s a small dataset:
data = np.array([
[150, 6],
[300, 12],
[50, 2],
[250, 8],
[80, 3]
])
Each row represents a customer’s purchase profile.
⚙️ K-Means Implementation in Python
from sklearn.cluster import KMeans
import numpy as np
# Customer purchase data
data = np.array([[150, 6], [300, 12], [50, 2], [250, 8], [80, 3]])
# Initialize KMeans with 3 clusters
kmeans = KMeans(n_clusters=3)
kmeans.fit(data)
# Predict cluster labels
labels = kmeans.labels_
# Get cluster centroids
centroids = kmeans.cluster_centers_
🔍 Step-by-Step Breakdown
-
Initialization
kmeans = KMeans(n_clusters=3)We ask K-Means to group customers into 3 clusters.
-
Fitting the Model
kmeans.fit(data)The algorithm assigns each data point to a cluster and iteratively updates the centroids (mean position of each cluster).
-
Cluster Labels
labels = kmeans.labels_This gives us an array of integers (0, 1, or 2) indicating which cluster each customer belongs to.
Example Output:
labels = [1, 2, 0, 2, 0]-
Customer
[150, 6]→ Cluster 1 -
Customer
[300, 12]→ Cluster 2 -
Customer
[50, 2]→ Cluster 0 -
…and so on.
-
-
Cluster Centroids
centroids = kmeans.cluster_centers_This gives the coordinates of the cluster centers, which represent the “average customer” in each segment.
Example Output:
[[ 65, 2.5 ], [150, 6.0 ], [275, 10.0 ]]
📊 Interpretation
The variable labels represents the cluster assignment of each customer:
-
Customers in the same cluster have similar spending and purchase patterns.
-
Businesses can use these insights for:
-
Personalized marketing
-
Loyalty programs
-
Targeted discounts
-
For instance:
-
Cluster 0 → Low spenders (budget customers)
-
Cluster 1 → Medium spenders
-
Cluster 2 → High spenders / premium customers
🚀 Key Takeaways
-
K-Means is an unsupervised algorithm that groups data into
kclusters. -
labelsindicate which cluster each data point belongs to. -
centroidsrepresent the average position of each cluster. -
Useful in customer segmentation, market research, image compression, and more.
👉 This simple example shows how a business can segment customers based on purchasing behavior to make data-driven decisions.
Would you like me to also create a visual scatter plot with cluster colors and centroids (Python + Matplotlib code) for this dataset? That would make the blog much more engaging.
Comments
Post a Comment