Day | Sales |
---|---|
str | i64 |
"Fri" | 319 |
"Mon" | 182 |
"Tue" | 247 |
"Wed" | 547 |
"Thur" | 481 |
"Fri" | 319 |
"Sat" | 208 |
"Sun" | 67 |
Identify and remove duplicate rows in polars
is_duplicated
Duplicates are a common problem in most datasets. That’s why it’s important to check for duplicate rows before analyzing your data. Fortunately, Polars provides an easy way to identify duplicate rows. Below is a dataframe showing sales for each day of the week.
Show duplicates
To display duplicated rows in Polars, use the is_duplicated
method. This method highlights rows that have identical values across all columns.
(dffilter(df.is_duplicated())
. )
Day | Sales |
---|---|
str | i64 |
"Fri" | 319 |
"Fri" | 319 |
You can see that the sales record for Friday was entered twice, so it is duplicated.
Remove duplicates
To remove duplicates in Polars, use the unique
method. This ensures that only rows with distinct values across columns are retained. Here’s how to do it:
(df
.unique() )
Day | Sales |
---|---|
str | i64 |
"Sun" | 67 |
"Thur" | 481 |
"Mon" | 182 |
"Wed" | 547 |
"Tue" | 247 |
"Sat" | 208 |
"Fri" | 319 |
Now you can see that Friday appears only once!
Join the many students improving their Polars skills in my Polars course.