Speed up analysis with lazyframes in polars

scan_csv

100DaysOfPolars
Author

Joram Mutenge

Published

2025-09-22

Datasets are getting bigger, which means loading the entire dataset into memory can put a strain on your computer. Fortunately, Polars allows you to scan data and store it in a lazyframe before loading it into memory as a dataframe.

Say we have this CSV file:

csv_file = 'https://raw.githubusercontent.com/jorammutenge/learn-rust/refs/heads/main/sample_sales.csv'

Create a lazyframe by scanning a CSV file

To read a CSV file as a lazyframe, you use the Polars expression scan_csv, like this:

import polars as pl

(pl.scan_csv(csv_file)
 .select('Account Name','ext price')
 .group_by('Account Name')
 .agg(pl.mean('ext price'))
 .collect()
 )
shape: (718, 2)
Account Name ext price
str f64
"Bergstrom, Medhurst and Zieme" 198.56
"Runolfsdottir, Rolfson and Pac… 988.19
"Weimann, Swift and Conroy" 940.32
"Watsica PLC" 46.88
"Mills Inc" 570.538
"Lemke, Kovacek and McClure" 521.18
"Jakubowski, Stark and Glover" 126.3
"Sawayn-Harris" 450.48
"Nicolas, Buckridge and Rowe" 98.42
"Hoppe PLC" 275.76

In the lazyframe, I selected the two columns I wanted and computed the average price for each customer. The resulting lazyframe was then loaded into memory as a dataframe.

Doing computations on a lazyframe is faster because nothing is stored in memory until you collect the results.

Click to join 150+ students learning Polars course.