Subscription | Price |
---|---|
str | f64 |
"Netflix" | 15.49 |
"Hulu" | 7.99 |
"HBO Max" | 15.99 |
"Showtime" | 10.99 |
"Paramount" | 11.99 |
"Disney+" | 7.99 |
"AMC" | 8.99 |
Group numerical data into categorical buckets in polars
qcut
100DaysOfPolars
If you’re a movie buff like me, you probably subscribe to multiple streaming services. Suppose you wanted to categorize the cost of your subscription fees for these services into three categories: cheap, okay, and expensive. How could you do it?
Below is a dataframe showing various streaming services.
shape: (7, 2)
It’s difficult to look at a table full of numbers and decide which category each streaming service belongs to based on price. To make this process easier, you can use the Polars expression qcut
as shown below:
(df'Price').qcut([0.25, 0.75],
.with_columns(pl.col(=['Cheap','Okay','Expensive'])
labels'Category')
.alias(
) )
shape: (7, 3)
Subscription | Price | Category |
---|---|---|
str | f64 | cat |
"Netflix" | 15.49 | "Expensive" |
"Hulu" | 7.99 | "Cheap" |
"HBO Max" | 15.99 | "Expensive" |
"Showtime" | 10.99 | "Okay" |
"Paramount" | 11.99 | "Okay" |
"Disney+" | 7.99 | "Cheap" |
"AMC" | 8.99 | "Okay" |
The code above divides the range of values in Price based on quantile probabilities. Now you can simply look at the streaming service and the Category column to see whether it’s within your budget.
Learn more in my Polars course!