Speed up analysis with lazyframes in polars

scan_csv

100DaysOfPolars
Author

Joram Mutenge

Published

2025-09-22

Datasets are getting bigger, which means loading the entire dataset into memory can put a strain on your computer. Fortunately, Polars allows you to scan data and store it in a lazyframe before loading it into memory as a dataframe.

Say we have this CSV file:

csv_file = 'https://raw.githubusercontent.com/jorammutenge/learn-rust/refs/heads/main/sample_sales.csv'


Create a lazyframe by scanning a CSV file

To read a CSV file as a lazyframe, you use the Polars expression scan_csv, like this:

import polars as pl

(pl.scan_csv(csv_file)
 .select('Account Name','ext price')
 .group_by('Account Name')
 .agg(pl.mean('ext price'))
 .collect()
 )
shape: (718, 2)
Account Name ext price
str f64
"Brekke PLC" 1398.505
"Halvorson and Sons" 594.81
"O'Conner Inc" 313.2
"Miller PLC" 293.88
"Conroy-Schaden" 498.9
"Batz Inc" 1069.98
"Swift-Okuneva" 320.295
"Medhurst and Sons" 500.8
"Treutel, Muller and O'Kon" 1513.34
"Nicolas-Emard" 68.39


In the lazyframe, I selected the two columns I wanted and computed the average price for each customer. The resulting lazyframe was then loaded into memory as a dataframe.

Doing computations on a lazyframe is faster because nothing is stored in memory until you collect the results.

Click to join 150+ students learning Polars course.