🌳 Decision Tree Splitting Rules Explained (with min_samples_split & min_samples

🌳 Decision Tree Splitting Rules Explained (with min_samples_split & min_samples_leaf)

- August 30, 2025

Decision Trees are among the most intuitive machine learning algorithms. However, when we tune their hyperparameters, especially min_samples_split and min_samples_leaf, it’s important to understand how they influence the tree’s growth.

Let’s explore this with a concrete example from Scikit-Learn’s DecisionTreeClassifier.

📌 The Code

from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Load dataset
X, y = load_breast_cancer(as_frame=True, return_X_y=True)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Decision Tree with constraints
clf = DecisionTreeClassifier(min_samples_split=7, min_samples_leaf=4, random_state=5)
clf.fit(X_train, y_train)

print(clf.score(X_test, y_test))

Here we set:

min_samples_split = 7 → A node must have at least 7 samples to even attempt splitting.
min_samples_leaf = 4 → After splitting, each child node must have at least 4 samples.

⚙️ Rules for Splitting a Node

A split at node N will only be performed if:

Node Size Check: samples_at_node ≥ min_samples_split
→ If fewer than 7 samples, no split.
Child Size Check: After the split, both children must have ≥ 4 samples.
→ Otherwise, split is invalid.

📝 Scenarios to Test

We are given multiple hypothetical scenarios. Let’s check them one by one:

✅ Scenario 1: Node N = 15 (split → 10 left, 5 right)

Node size = 15 ≥ 7 → ✅
Left = 10 ≥ 4, Right = 5 ≥ 4 → ✅
👉 Valid Split

✅ Scenario 2: Node N = 8 (split → 4 left, 4 right)

Node size = 8 ≥ 7 → ✅
Left = 4 ≥ 4, Right = 4 ≥ 4 → ✅
👉 Valid Split

❌ Scenario 3: Node N = 9 (split → 2 left, 7 right)

Node size = 9 ≥ 7 → ✅
Left = 2 ❌ (violates min_samples_leaf)
👉 Invalid Split

✅ Scenario 4: Node N = 14 (split → 4 left, 10 right)

Node size = 14 ≥ 7 → ✅
Left = 4 ≥ 4, Right = 10 ≥ 4 → ✅
👉 Valid Split

❌ Scenario 5: Node N = 6 (split → 3 left, 3 right)

Node size = 6 ❌ (violates min_samples_split)
👉 Invalid Split

🎯 Final Correct Options

Only these scenarios lead to valid splits:

Node N = 15 → (10, 5)
Node N = 8 → (4, 4)
Node N = 14 → (4, 10)

🚀 Key Takeaways

min_samples_split ensures that nodes are not split if they are too small.
min_samples_leaf ensures that leaf nodes are not too small, preventing overfitting.
Always check both conditions before concluding if a split is valid.

By tuning these parameters, you can control tree depth, generalization ability, and robustness of your model.

👉 Do you want me to also add a diagram (decision tree splitting illustration) in the blog to make it even more visually clear?

Search This Blog

Data Science