Group numerical data into categorical buckets in polars

qcut

100DaysOfPolars

Author

Joram Mutenge

Published

2025-10-20

If you’re a movie buff like me, you probably subscribe to multiple streaming services. Suppose you wanted to categorize the cost of your subscription fees for these services into three categories: cheap, okay, and expensive. How could you do it?

Below is a dataframe showing various streaming services.

shape: (7, 2)

Subscription	Price
str	f64
"Netflix"	15.49
"Hulu"	7.99
"HBO Max"	15.99
"Showtime"	10.99
"Paramount"	11.99
"Disney+"	7.99
"AMC"	8.99

It’s difficult to look at a table full of numbers and decide which category each streaming service belongs to based on price. To make this process easier, you can use the Polars expression qcut as shown below:

(df
 .with_columns(pl.col('Price').qcut([0.25, 0.75],
               labels=['Cheap','Okay','Expensive'])
               .alias('Category')
               )
 )

shape: (7, 3)

Subscription	Price	Category
str	f64	cat
"Netflix"	15.49	"Expensive"
"Hulu"	7.99	"Cheap"
"HBO Max"	15.99	"Expensive"
"Showtime"	10.99	"Okay"
"Paramount"	11.99	"Okay"
"Disney+"	7.99	"Cheap"
"AMC"	8.99	"Okay"

The code above divides the range of values in Price based on quantile probabilities. Now you can simply look at the streaming service and the Category column to see whether it’s within your budget.

Learn more in my Polars course!