📝 Data Science MCQ Explainer Blog – Handling Missing Data in Pandas
When preparing datasets for analysis, one of the most common challenges you will face is missing data. Pandas provides powerful functions like dropna(), fillna(), and isna() to handle such cases. Today, let’s solve a real MCQ-style question step by step and extract the key learning patterns you can apply to any similar question.
❓ The Question
You have loaded a dataset with 1000 samples and 20 features into a Pandas DataFrame. Some samples have missing values for a few features. If you want to remove rows with fewer than 15 non-null values, what method would you use?
Options:
-
dropna(how='any') -
✅
dropna(thresh=15) -
dropna(thresh=5) -
drop(columns=[15]) -
dropna(how='all')
✅ The Correct Answer
dropna(thresh=15)
🧠 Step-by-Step Reasoning
1. Understand the Goal
-
Dataset: 1000 rows × 20 columns.
-
Requirement: Keep only rows that have at least 15 non-missing values.
This means → if a row has 14 or fewer valid values → drop it.
2. Decode Pandas Functions
-
dropna(how='any')→ Drops a row if any value is missing. ❌ Too strict, we don’t want to remove everything with just 1 missing. -
dropna(how='all')→ Drops a row only if all values are missing. ❌ Too lenient, not suitable here. -
dropna(thresh=n)→ Keeps rows with at least n non-null values. ✅ Perfect match for our case. -
drop(columns=[15])→ Drops the 15th column, unrelated to missing values. ❌ -
dropna(thresh=5)→ Keeps rows with at least 5 non-null values. ❌ Too loose.
3. Why thresh=15 Works?
df.dropna(thresh=15) ensures:
-
Any row with < 15 valid entries → removed.
-
Rows with ≥ 15 valid entries → kept.
Exactly matches the requirement.
🔑 Key Learning Pattern
When you see Pandas missing value questions:
-
Check the target condition (any missing, all missing, minimum valid count).
-
Map it to the parameter:
-
how='any'→ strict (drops if any value is missing). -
how='all'→ lenient (drops only if everything is missing). -
thresh=n→ keeps rows/cols with at leastnvalid values.
-
-
Avoid confusion with
drop(columns=[])which is column removal, not missing-data handling.
🧩 How This Helps in Similar Questions
-
If they ask for removing rows with at least 1 missing value → use
dropna(how='any'). -
If they ask for removing rows where everything is missing → use
dropna(how='all'). -
If they ask for keeping rows with a minimum number of valid entries → use
dropna(thresh=n). -
If they ask for removing entire columns → use
drop(columns=[]).
🚀 Final Takeaway
👉 The power of Pandas lies in parameter tuning. Questions may look tricky, but once you map the wording to Pandas’ dropna() parameters, the answer becomes straightforward. Always break it down into:
-
What condition triggers removal?
-
What parameter matches that condition?
📌 In the next blog, we’ll explore fillna() strategies (mean, median, forward fill, backward fill) and how they appear in MCQs.
Comments
Post a Comment