acct_name | category | quantity |
---|---|---|
str | str | i64 |
"Fritsch-Glover" | "Hat" | 1 |
"O'Conner Inc" | "Sweater" | 9 |
"Beatty and Sons" | "Sweater" | 12 |
"Gleason, Bogisich and Franecki" | "Sweater" | 5 |
"Morissette-Heathcote" | "Sweater" | 19 |
… | … | … |
"Brekke and Sons" | "Sweater" | 2 |
"Lang-Wunsch" | "Socks" | 19 |
"Bogisich and Sons" | "Socks" | 18 |
"Kutch, Cormier and Harber" | "Sweater" | 15 |
"Roberts, Volkman and Batz" | "Sweater" | 11 |
Group by calculations in polars
group_by
Aggregations to data professionals are what a saw is to a carpenter. Almost every data analysis you perform will involve an aggregate calculation, or group_by
calculations, to use the technical term. Below is a dataframe containing 1,000 rows.
Aggregating numerical values
Your boss isn’t interested in knowing how many sweaters Sally bought. She’s interested in how many total sweaters the store sold. Here’s how to perform that calculation in Polars. We’ll calculate both the total and average quantity.
(df'category')
.group_by(sum('quantity').alias('Total_Qty'),
.agg(pl.'quantity').alias('Avg_Qty'))
pl.mean( )
category | Total_Qty | Avg_Qty |
---|---|---|
str | i64 | f64 |
"Sweater" | 5551 | 10.553232 |
"Hat" | 1889 | 10.672316 |
"Socks" | 3125 | 10.521886 |
Aggregating text values
In Polars, group_by
aggregations can also be performed on non-numerical columns. Let’s consider a practical example. Suppose we wanted to know which customers bought clothing from all three categories. Here’s how we would do that.
(df'acct_name')
.group_by('category')
.agg(filter(pl.col('category').list.n_unique() == 3) # three is the max number of categories
. )
acct_name | category |
---|---|
str | list[str] |
"Beier-Bosco" | ["Hat", "Sweater", "Socks"] |
"Koepp-McLaughlin" | ["Sweater", "Hat", "Socks"] |
"Fritsch-Glover" | ["Hat", "Sweater", "Socks"] |
"Ledner-Kling" | ["Hat", "Sweater", "Socks"] |
"Herman Ltd" | ["Sweater", "Sweater", … "Socks"] |
… | … |
"Mills Inc" | ["Sweater", "Sweater", … "Sweater"] |
"Halvorson PLC" | ["Hat", "Sweater", "Socks"] |
"Bashirian, Beier and Watsica" | ["Socks", "Hat", "Sweater"] |
"Upton, Runolfsson and O'Reilly" | ["Hat", "Sweater", "Socks"] |
"Kuvalis-Roberts" | ["Sweater", "Socks", … "Sweater"] |
Now we know that 11 customers bought clothing from all three categories: hats, socks, and sweaters.
I’m looking forward to teaching you in my Polars course.