Polynomial Features in Scikit-Learn: A Beginner’s Guide

- August 30, 2025

When working with machine learning, sometimes a straight line (simple linear regression) is not enough to describe the relationship between input and output. This is where polynomial features come in — they allow us to create extra features like squares, cubes, etc., of our original inputs.

Let’s go through a simple example step by step.

The Code

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

X = np.array([[1], [2], [3], [4]])

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

print(X_poly)

Step 1: The Input Data

Our original input X is:

[[1],
 [2],
 [3],
 [4]]

This is just a column vector with numbers 1 to 4.

Step 2: Applying Polynomial Features

We create a PolynomialFeatures object with degree=2. This means:

For every input value x, it will generate:

$[1, x, x^2]$

The 1 is called the bias (intercept) term.

Step 3: The Transformation

When we run fit_transform(X), each row gets expanded:

For 1: [1, 1, 1]
For 2: [1, 2, 4]
For 3: [1, 3, 9]
For 4: [1, 4, 16]

So the final matrix is:

[[ 1.  1.  1.]
 [ 1.  2.  4.]
 [ 1.  3.  9.]
 [ 1.  4. 16.]]

Step 4: Why Do We Do This?

By adding polynomial terms, we allow models like linear regression to fit curves instead of just straight lines. For example:

Linear regression with just [x] fits a straight line.
Linear regression with [1, x, x²] can fit a parabola.

This makes polynomial features a simple but powerful trick for capturing non-linear relationships.

Key Takeaways for Beginners

PolynomialFeatures(degree=n) expands your input features up to power n.
The first column is always 1 (the bias term).
This helps linear models capture non-linear patterns.

✅ In our example, the correct output is:

[[ 1.  1.  1.]
 [ 1.  2.  4.]
 [ 1.  3.  9.]
 [ 1.  4. 16.]]

Search This Blog

Data Science