| Name | Week_2 | Week_3 |
|---|---|---|
| str | i64 | i64 |
| "Lucy" | 74 | 93 |
| "Pippa" | 89 | 89 |
| "Evan" | 90 | 90 |
| "Bree" | 52 | null |
| "Diana" | null | 75 |
Equalities with nulls in polars
When null values interfere with filtering
Recently at work, I discovered something unexpected when filtering values with Polars. Below is a dataframe showing students’ weekly quiz results.
Two students, Bree and Diana, missed one quiz in week 2 and week 3, respectively. Suppose you want to remove the students who received the same grade so you can focus on those who either increased or decreased their score. You might expect the code below to work.
(df
.filter(pl.col('Week_2').ne(pl.col('Week_3')))
)| Name | Week_2 | Week_3 |
|---|---|---|
| str | i64 | i64 |
| "Lucy" | 74 | 93 |
The problem is that it also removes the students who missed one quiz. Why does this happen?
In Polars, any comparison involving a null value produces null, not True or False. A filter keeps only the rows where the predicate evaluates to True. As a result, rows where the predicate evaluates to null are dropped.
To keep the students who missed one quiz, you need to write the code like this:
(df
.filter(pl.col('Week_2').ne(pl.col('Week_3')) | pl.col('Week_2').is_null() | pl.col('Week_3').is_null())
)| Name | Week_2 | Week_3 |
|---|---|---|
| str | i64 | i64 |
| "Lucy" | 74 | 93 |
| "Bree" | 52 | null |
| "Diana" | null | 75 |
Check out my Polars course to learn more.