Just like in SQL, it’s possible to join two dataframes in Polars. In this article, I’ll show you how to use the inner join in Polars to create a dataframe with more columns. Below is a dataframe, movies_df, showing finance movies.
shape: (4, 3)
movie
year
rating
str
i64
f64
"The Big Short"
2015
7.8
"Wall Street"
1987
7.3
"Boiler Room"
2000
7.0
"Arbitrage"
2012
6.6
And here’s another dataframe, minutes_df, showing the duration of the movies.
shape: (4, 2)
movie
minutes
str
i64
"The Big Short"
130
"Wall Street"
126
"Boiler Room"
120
"Arbitrage"
107
Join dataframes
You can use the inner join to get all the rows that match in both dataframes based on a column you choose as the key. In this case, the key column is movie.
Now we have a single dataframe that contains all four columns.
Other joins you’ll use
Polars supports several other joins, such as:
left join - Keeps all rows from the left dataframe, even those without a match, and includes only matching rows from the right dataframe. Polars does not support a right join directly, but you can achieve the same result by swapping the dataframes.
anti join - Similar to a set difference in mathematics. It removes matching rows from the left dataframe.
Improve your Polars skills by taking this Polars course. See you in class.