Decision Tree Classifier Example: Beginner-Friendly Explanation

Decision Trees are one of the simplest and most interpretable models in machine learning. They split data step by step based on feature values and decide which class a sample belongs to.

Let’s look at a real code example with scikit-learn and understand what happens.


The Code

from sklearn.tree import DecisionTreeClassifier

X_train = [[1, 10], [2, 20], [3, 30], [4, 40]]
y_train = [0, 0, 0, 1]

clf = DecisionTreeClassifier(max_depth=2, random_state=0)
clf.fit(X_train, y_train)

X_test = [[2.5, 25]]
print(clf.predict(X_test))

Step 1: Training Data

The training data (X_train) has two features per row:

[1, 10] → class 0
[2, 20] → class 0
[3, 30] → class 0
[4, 40] → class 1

So the first three samples belong to class 0, and the last one belongs to class 1.


Step 2: Building the Decision Tree

  • max_depth=2 means the tree can split at most two times from the root to a leaf.

  • The tree looks for the best feature and threshold to separate classes.

  • Since most of the data belongs to class 0, the tree will learn that values closer to [1,2,3] should map to class 0.


Step 3: Making a Prediction

We test with:

X_test = [[2.5, 25]]

This point lies between [2,20] and [3,30], both of which are class 0.

So, the decision tree predicts:

[0]

Step 4: The Output

Final result:

[0]

✅ Correct answer = [0]


Key Takeaways for Freshers

  1. Decision trees split data step by step using feature thresholds.

  2. The parameter max_depth controls how “deep” the tree can grow (to prevent overfitting).

  3. The prediction is made by following the learned splits until a leaf node is reached.

  4. In this example, since the test point is closer to samples of class 0, the output is [0].


This small example shows how decision trees are intuitive and easy to understand, making them a great starting point in machine learning!

Comments

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply