What does StandardScaler do?

- August 30, 2025

Perfect, let’s carefully analyze this code step by step.

Code Recap

import pandas as pd
from sklearn.preprocessing import StandardScaler

data = pd.DataFrame({
    'col1': [1, 2, 3, 4, 5],
    'col2': [10, 20, 30, 40, 50]
})

ss = StandardScaler()
scaled_data = ss.fit_transform(data)
print(ss.var_)

We are asked: What will ss.var_ print?

Step 1: What does `StandardScaler` do?

It standardizes features by removing the mean and scaling to unit variance.
Internally, it computes:
$\text{var\_} = \frac{\sum (x_i - \mu)^2}{n}$
👉 Notice: StandardScaler uses population variance (divide by n), not sample variance (which divides by n-1).

Step 2: Compute variance for `col1`

col1 = [1, 2, 3, 4, 5]

Mean = $(1+2+3+4+5)/5 = 15/5 = 3$

\text{var} = \frac{(1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2}{5}

= \frac{4 + 1 + 0 + 1 + 4}{5} = \frac{10}{5} = 2

So, var(col1) = 2.

Step 3: Compute variance for `col2`

col2 = [10, 20, 30, 40, 50]

Mean = $(10+20+30+40+50)/5 = 150/5 = 30$

\text{var} = \frac{(10-30)^2 + (20-30)^2 + (30-30)^2 + (40-30)^2 + (50-30)^2}{5}

= \frac{400 + 100 + 0 + 100 + 400}{5} = \frac{1000}{5} = 200

So, var(col2) = 200.

Step 4: Final Answer

ss.var_ = [2, 200]

✅ Correct Option: [2, 200]

Would you like me to also explain why the other options are wrong (e.g., why [4,400] or [10,10000] appear as distractors)?

Search This Blog

Data Science

What does StandardScaler do?

Code Recap

Step 1: What does `StandardScaler` do?

Step 2: Compute variance for `col1`

Step 3: Compute variance for `col2`

Step 4: Final Answer

Comments

Post a Comment

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply

What does StandardScaler do?

Code Recap

Step 1: What does StandardScaler do?

Step 2: Compute variance for col1

Step 3: Compute variance for col2

Step 4: Final Answer

Comments

Post a Comment

Popular posts from this blog

Understanding Data Leakage in Machine Learning: Causes, Examples, and Prevention

🌳 Understanding Maximum Leaf Nodes in Decision Trees (Scikit-Learn)

Linear Regression with and without Intercept: Explained Simply

Step 1: What does `StandardScaler` do?

Step 2: Compute variance for `col1`

Step 3: Compute variance for `col2`