Speed up analysis with lazyframes in polars

scan_csv

100DaysOfPolars
Author

Joram Mutenge

Published

2025-09-22

Datasets are getting bigger, which means loading the entire dataset into memory can put a strain on your computer. Fortunately, Polars allows you to scan data and store it in a lazyframe before loading it into memory as a dataframe.

Say we have this CSV file:

csv_file = 'https://raw.githubusercontent.com/jorammutenge/learn-rust/refs/heads/main/sample_sales.csv'


Create a lazyframe by scanning a CSV file

To read a CSV file as a lazyframe, you use the Polars expression scan_csv, like this:

import polars as pl

(pl.scan_csv(csv_file)
 .select('Account Name','ext price')
 .group_by('Account Name')
 .agg(pl.mean('ext price'))
 .collect()
 )
shape: (718, 2)
Account Name ext price
str f64
"Kuphal, Flatley and Casper" 929.46
"Feil LLC" 1383.9
"Thiel-Volkman" 279.52
"McDermott, Gerlach and Bechtel… 187.68
"VonRueden, Wiza and Balistreri" 338.52
"O'Conner Inc" 313.2
"Herzog-Homenick" 1051.2
"Wilderman Group" 1155.64
"O'Kon, Braun and Corkery" 290.2
"Corwin, Nienow and Reichert" 393.24


In the lazyframe, I selected the two columns I wanted and computed the average price for each customer. The resulting lazyframe was then loaded into memory as a dataframe.

Doing computations on a lazyframe is faster because nothing is stored in memory until you collect the results.

Click to join 150+ students learning Polars course.