Speed up analysis with lazyframes in polars

scan_csv

100DaysOfPolars
Author

Joram Mutenge

Published

2025-09-22

Datasets are getting bigger, which means loading the entire dataset into memory can put a strain on your computer. Fortunately, Polars allows you to scan data and store it in a lazyframe before loading it into memory as a dataframe.

Say we have this CSV file:

csv_file = 'https://raw.githubusercontent.com/jorammutenge/learn-rust/refs/heads/main/sample_sales.csv'


Create a lazyframe by scanning a CSV file

To read a CSV file as a lazyframe, you use the Polars expression scan_csv, like this:

import polars as pl

(pl.scan_csv(csv_file)
 .select('Account Name','ext price')
 .group_by('Account Name')
 .agg(pl.mean('ext price'))
 .collect()
 )
shape: (718, 2)
Account Name ext price
str f64
"Fritsch-Glover" 407.453333
"Larson-Huels" 31.0
"Murray, Herzog and Treutel" 1399.66
"Sporer, Hickle and Steuber" 149.536667
"Lockman, Fisher and Considine" 25.18
"Wuckert-Gulgowski" 341.52
"Stoltenberg, Berge and Roberts" 626.32
"Treutel, Muller and O'Kon" 1513.34
"Strosin, Nader and Zulauf" 624.72
"Marvin, Schroeder and Herman" 1897.53


In the lazyframe, I selected the two columns I wanted and computed the average price for each customer. The resulting lazyframe was then loaded into memory as a dataframe.

Doing computations on a lazyframe is faster because nothing is stored in memory until you collect the results.

Click to join 150+ students learning Polars course.