Polynomial Features in Scikit-Learn: A Beginner’s Guide
When working with machine learning, sometimes a straight line (simple linear regression) is not enough to describe the relationship between input and output. This is where polynomial features come in — they allow us to create extra features like squares, cubes, etc., of our original inputs.
Let’s go through a simple example step by step.
The Code
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
X = np.array([[1], [2], [3], [4]])
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
print(X_poly)
Step 1: The Input Data
Our original input X is:
[[1],
[2],
[3],
[4]]
This is just a column vector with numbers 1 to 4.
Step 2: Applying Polynomial Features
We create a PolynomialFeatures object with degree=2. This means:
-
For every input value
x, it will generate:
The 1 is called the bias (intercept) term.
Step 3: The Transformation
When we run fit_transform(X), each row gets expanded:
-
For
1:[1, 1, 1] -
For
2:[1, 2, 4] -
For
3:[1, 3, 9] -
For
4:[1, 4, 16]
So the final matrix is:
[[ 1. 1. 1.]
[ 1. 2. 4.]
[ 1. 3. 9.]
[ 1. 4. 16.]]
Step 4: Why Do We Do This?
By adding polynomial terms, we allow models like linear regression to fit curves instead of just straight lines. For example:
-
Linear regression with just
[x]fits a straight line. -
Linear regression with
[1, x, x²]can fit a parabola.
This makes polynomial features a simple but powerful trick for capturing non-linear relationships.
Key Takeaways for Beginners
-
PolynomialFeatures(degree=n)expands your input features up to powern. -
The first column is always
1(the bias term). -
This helps linear models capture non-linear patterns.
✅ In our example, the correct output is:
[[ 1. 1. 1.]
[ 1. 2. 4.]
[ 1. 3. 9.]
[ 1. 4. 16.]]
Comments
Post a Comment