As a data professional, it’s inevitable that you’ll end up creating a pivot table. Stakeholders love pivot tables, and they won’t pass up a chance to request one from you. It’s not surprising that most people think of Microsoft Excel when they hear PivotTable. It turns out that Microsoft held a trademark on the term in the United States from 1994 to 2020. Today, however, the name PivotTable has become so generic that it no longer qualifies for trademark protection.
Hex swag I got at PyConUS 2026
Fortunately, pivot tables can also be created with Polars. This means you won’t have to export your data to Excel and fiddle with it just to create a pivot table.
The dataset
I’ll be using sales funnel data. In business, some sales cycles are very long; for instance, those involving enterprise software or capital equipment, and management often wants to understand them in more detail throughout the year. The following are some of the questions they are likely to ask:
How much revenue is in the pipeline?
What products are in the pipeline?
Who has which products at what stage?
How likely are we to close deals by year-end?
Larger companies typically have CRM tools or other software that the sales team uses to track the sales process. Still, many people simply export the data to Excel, where they summarize it by creating a pivot table.
The benefits of creating a pivot table with Polars are:
It’s quicker once it’s set up.
It’s self-documenting. You can look at the code and understand what it does.
It’s easier to generate multiple reports.
It offers the flexibility to define custom aggregation functions.
Let’s read in the data.
import polars as plpl.Config(set_tbl_rows=20)df = pl.read_excel('https://pbpython.com/extras/sales-funnel.xlsx')df
shape: (17, 8)
Account
Name
Rep
Manager
Product
Quantity
Price
Status
i64
str
str
str
str
i64
i64
str
714466
"Trantow-Barrows"
"Craig Booker"
"Debra Henley"
"CPU"
1
30000
"presented"
714466
"Trantow-Barrows"
"Craig Booker"
"Debra Henley"
"Software"
1
10000
"presented"
714466
"Trantow-Barrows"
"Craig Booker"
"Debra Henley"
"Maintenance"
2
5000
"pending"
737550
"Fritsch, Russel and Anderson"
"Craig Booker"
"Debra Henley"
"CPU"
1
35000
"declined"
146832
"Kiehn-Spinka"
"Daniel Hilton"
"Debra Henley"
"CPU"
2
65000
"won"
218895
"Kulas Inc"
"Daniel Hilton"
"Debra Henley"
"CPU"
2
40000
"pending"
218895
"Kulas Inc"
"Daniel Hilton"
"Debra Henley"
"Software"
1
10000
"presented"
412290
"Jerde-Hilpert"
"John Smith"
"Debra Henley"
"Maintenance"
2
5000
"pending"
740150
"Barton LLC"
"John Smith"
"Debra Henley"
"CPU"
1
35000
"declined"
141962
"Herman LLC"
"Cedric Moss"
"Fred Anderson"
"CPU"
2
65000
"won"
163416
"Purdy-Kunde"
"Cedric Moss"
"Fred Anderson"
"CPU"
1
30000
"presented"
239344
"Stokes LLC"
"Cedric Moss"
"Fred Anderson"
"Maintenance"
1
5000
"pending"
239344
"Stokes LLC"
"Cedric Moss"
"Fred Anderson"
"Software"
1
10000
"presented"
307599
"Kassulke, Ondricka and Metz"
"Wendy Yule"
"Fred Anderson"
"Maintenance"
3
7000
"won"
688981
"Keeling LLC"
"Wendy Yule"
"Fred Anderson"
"CPU"
5
100000
"won"
729833
"Koepp Ltd"
"Wendy Yule"
"Fred Anderson"
"CPU"
2
65000
"declined"
729833
"Koepp Ltd"
"Wendy Yule"
"Fred Anderson"
"Monitor"
2
5000
"presented"
Creating pivot tables
The pivot method is used to create a pivot table in Polars. However, not all data outputs that look like pivot tables are created with pivot. Sometimes, group_by does the trick.
Suppose we wanted to find the average (mean) Account, Price, and Quantity for each Name. We can do so using group_by.
In this case, the values in Name are unique keys that appear only once. If the original dataframe contained one name appearing two or more times, Polars would calculate the average of its values and display the company name only once in the final output. This is exactly what group_by does.
Since different aggregations are used on the same column Price, we must ensure that the final columns have different names, otherwise the code breaks. Polars dataframes can’t have multiple columns with the same name.
shape: (5, 4)
Manager
Rep
Avg_Price
Count_Price
str
str
f64
u32
"Fred Anderson"
"Wendy Yule"
44250.0
4
"Debra Henley"
"Daniel Hilton"
38333.333333
3
"Debra Henley"
"Craig Booker"
20000.0
4
"Fred Anderson"
"Cedric Moss"
27500.0
4
"Debra Henley"
"John Smith"
20000.0
2
Sometimes, we may want to see sales broken down by product. This is where pivot comes in. The values contained in the column specified by the on parameter become new columns in the final output.
A rule of thumb I follow is that whenever you want the values in a column to become new columns in the final output, use pivot; otherwise, use group_by. So, if we want Product to be part of the index and don’t need a column for the on parameter, then we’ll have to use group_by.
This output makes much more sense, at least for this dataset. Now, what if we want to include a grand total row at the bottom of the dataframe, much like you would see in an Excel pivot table? In pandas, this is easy to achieve. The pivot_table function has a margins=True parameter that adds a grand total row.
By contrast, Polars does not have a margins=True parameter in its pivot method. However, the versatility of Polars makes it possible to create a grand total row yourself.
Let’s look at the sales funnel at the manager level. At the moment, the Status column has the string data type. We’ll convert it to the enum data type and define the hierarchy in the order we want.
Notice how values in Status follow the order set in the enum, although this time in reverse since descending=True.
shape: (9, 3)
Manager
Status
Price
str
enum
i64
"Debra Henley"
"declined"
70000
"Debra Henley"
"presented"
50000
"Debra Henley"
"pending"
50000
"Debra Henley"
"won"
65000
"Fred Anderson"
"declined"
65000
"Fred Anderson"
"presented"
45000
"Fred Anderson"
"pending"
5000
"Fred Anderson"
"won"
172000
"All"
null
522000
So far, we’ve used group_by and pivot separately. Once again, unlike pandas, where you can pass different aggregation functions for a single column or multiple columns in one pivot table call, Polars allows only one aggregation function at a time. However, the same result can be achieved by combining the group_by and pivot methods.
Pivot tables are too useful and too powerful to remain the preserve of Microsoft Excel alone. They can be created with many data analysis tools, including Polars, pandas, and DuckDB.
Check out my Polars book to learn more about this powerful data analysis tool.