Getting summary statistics in polars

describe

100DaysOfPolars
Author

Joram Mutenge

Published

2025-07-28

Data exploration is the first step in any data analysis task. You can’t start analyzing your data without understanding what it’s about. If you have numerical columns, one of the first things to do is get summary statistics for those columns like the mean (average), median, and standard deviation. Below is a dataframe of a clothing store showing customer purchase details.

shape: (1_000, 4)
company category quantity price
str str i64 f64
"Fritsch-Glover" "Hat" 1 98.98
"O'Conner Inc" "Sweater" 9 34.8
"Beatty and Sons" "Sweater" 12 60.24
"Gleason, Bogisich and Franecki" "Sweater" 5 15.25
"Morissette-Heathcote" "Sweater" 19 51.83
"Brekke and Sons" "Sweater" 2 46.48
"Lang-Wunsch" "Socks" 19 29.25
"Bogisich and Sons" "Socks" 18 54.79
"Kutch, Cormier and Harber" "Sweater" 15 62.53
"Roberts, Volkman and Batz" "Sweater" 11 86.4


Get summary stats

To get summary statistics for all the columns in the above dataframe, you can use the Polars method describe. Summary statistics are most meaningful for numerical columns, but Polars provides them for categorical columns as well. For categorical columns, the most important statistic is null_count, which tells you the number of missing (empty) values in that column.

(df
 .describe()
 )
shape: (9, 5)
statistic company category quantity price
str str str f64 f64
"count" "1000" "1000" 1000.0 1000.0
"null_count" "0" "0" 0.0 0.0
"mean" null null 10.565 54.06643
"std" null null 5.887311 26.068011
"min" "Abbott PLC" "Hat" 1.0 10.01
"25%" null null 5.0 31.25
"50%" null null 11.0 53.27
"75%" null null 16.0 75.1
"max" "Zulauf-Will" "Sweater" 20.0 100.0


You can now quickly see that the highest quantity for a single purchase is 20, and the highest price is 100. More importantly, none of the columns have missing values, as indicated by a null_count value of 0 for all columns.

Enroll in the Polars course and join 100 plus students.