Polynomial Features in Scikit-Learn: A Beginner’s Guide

When working with machine learning, sometimes a straight line (simple linear regression) is not enough to describe the relationship between input and output. This is where polynomial features come in — they allow us to create extra features like squares, cubes, etc., of our original inputs.

Let’s go through a simple example step by step.


The Code

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

X = np.array([[1], [2], [3], [4]])

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

print(X_poly)

Step 1: The Input Data

Our original input X is:

[[1],
 [2],
 [3],
 [4]]

This is just a column vector with numbers 1 to 4.


Step 2: Applying Polynomial Features

We create a PolynomialFeatures object with degree=2. This means:

  • For every input value x, it will generate:

    [1,x,x2][1, x, x^2]

The 1 is called the bias (intercept) term.


Step 3: The Transformation

When we run fit_transform(X), each row gets expanded:

  • For 1: [1, 1, 1]

  • For 2: [1, 2, 4]

  • For 3: [1, 3, 9]

  • For 4: [1, 4, 16]

So the final matrix is:

[[ 1.  1.  1.]
 [ 1.  2.  4.]
 [ 1.  3.  9.]
 [ 1.  4. 16.]]

Step 4: Why Do We Do This?

By adding polynomial terms, we allow models like linear regression to fit curves instead of just straight lines. For example:

  • Linear regression with just [x] fits a straight line.

  • Linear regression with [1, x, x²] can fit a parabola.

This makes polynomial features a simple but powerful trick for capturing non-linear relationships.


Key Takeaways for Beginners

  1. PolynomialFeatures(degree=n) expands your input features up to power n.

  2. The first column is always 1 (the bias term).

  3. This helps linear models capture non-linear patterns.


✅ In our example, the correct output is:

[[ 1.  1.  1.]
 [ 1.  2.  4.]
 [ 1.  3.  9.]
 [ 1.  4. 16.]]

Comments

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply