Creating dummy variables in a polars dataframe

to_dummies

100DaysOfPolars
Author

Joram Mutenge

Published

2025-09-12

Computers process numerical data faster than text data. That’s why it’s a good idea to convert text data into numerical data when training machine learning models. You want to squeeze out as much performance as possible from your machine. For instance, if you have a column Is_Present, instead of values like “Yes” and “No,” you should use 1 and 0 respectively.

Below is a dataframe showing the computer brands and the chip used in their computer models.

shape: (896, 2)
brand chip
str str
"Lenovo" "AMD"
"Lenovo" "AMD"
"Avita" "AMD"
"Avita" "AMD"
"Avita" "AMD"
"ASUS" "AMD"
"ASUS" "AMD"
"ASUS" "AMD"
"SAMSUNG" "Qualcomm"
"Lenovo" "AMD"


Create dummy variable from chip types

To make data processing faster, we need to convert chip values into numerical data. One approach is to create a separate column for each chip type, with a value of 1 if a computer brand uses that chip and 0 if it does not. Fortunately, this is very easy to do in Polars with the to_dummies method.

df.to_dummies(columns='chip')
shape: (896, 6)
brand chip_AMD chip_Intel chip_M1 chip_MediaTek chip_Qualcomm
str u8 u8 u8 u8 u8
"Lenovo" 1 0 0 0 0
"Lenovo" 1 0 0 0 0
"Avita" 1 0 0 0 0
"Avita" 1 0 0 0 0
"Avita" 1 0 0 0 0
"ASUS" 1 0 0 0 0
"ASUS" 1 0 0 0 0
"ASUS" 1 0 0 0 0
"SAMSUNG" 0 0 0 0 1
"Lenovo" 1 0 0 0 0


Now all the chip information has been converted into zeroes and ones. For example, in the first row, Lenovo uses an AMD chip. It does not use Intel or any of the other chips.

I want you to join the 150+ students in my Polars course.