• Blog
  • Book
Categories
All (60)
100DaysOfPolars (41)
data analysis (2)
visualization (2)
 

Collapsing row values into a list in polars

to_list
Python is big on lists because they are one of the fundamental data structures of the language. It’s no surprise that data analysis tasks in Python almost always involve…
2025-08-16

Joram Mutenge

 

Get top values without sorting in polars

top_k
Sorting values is an expensive operation, especially when you have a large dataset. However, sometimes you have no choice but to sort the data, particularly when you need to…
2025-08-15

Joram Mutenge

 

Collect text data into a list in polars

implode
Long format data (data with many rows) is good for performing analyses but bad for presentation. If you want people to easily understand your data, you should strive to…
2025-08-14

Joram Mutenge

 

Using quantile to filter data in polars

quantile
If you’ve ever taken the SAT, ACT, or GMAT, you might have wondered what percentile you scored in. A percentile is a way of describing a score’s position in a dataset…
2025-08-13

Joram Mutenge

 

Get unique values in a column in polars

unique
If one value in a column is overrepresented, you might think it’s the only value present since Polars displays only the first five and last five rows of the dataframe. To…
2025-08-12

Joram Mutenge

 

Using contains to filter data in polars

contains
The dirty secret is that data work is mainly about working with text data rather than numerical data. That’s why knowing how to filter text data is a valuable skill to have.…
2025-08-11

Joram Mutenge

 

Turn a polars series to a dataframe

to_frame
In Polars, a series is a one-column table, while a dataframe is a multi-column table. However, it’s possible to convert a series into a dataframe. Below is a series of cities.
2025-08-09

Joram Mutenge

 

When you want to count values without null in polars

pl.count
Most of the time, when you’re counting the number of values in a column, you’re not interested in including empty (null) values. Therefore, simply counting the number of…
2025-08-08

Joram Mutenge

 

Using polars selectors to filter dataframes

polars.selectors
Selecting specific columns to filter out unnecessary columns is common in data analysis. Polars makes this type of filtering straightforward through the use of selectors.…
2025-08-07

Joram Mutenge

 

Turn a matrix of data to a polars dataframe

from_records
In data science and analysis, storing data in an array or nested arrays (a matrix) is a common practice. Unfortunately, performing operations directly on arrays can be…
2025-08-06

Joram Mutenge

 

Get middle rows from a polars dataframe

slice
We all know how to get the first or last five rows of a dataset. But did you know you can also retrieve the middle rows? Below is a dataframe showing cereal brands:
2025-08-05

Joram Mutenge

 

Display thousand values with a comma separator in polars

pl.Config.set_thousands_separator
It’s easier to read large numerical values when they’re separated by a comma every three digits. For instance, your brain can more easily comprehend 5,000,000 at a glance…
2025-08-04

Joram Mutenge

 

Take a glimpse into your data with polars

glimpse
I’ve said it before, and I’ll say it again: it’s important to know the data you’re working with before you start analyzing it. Below is a dataframe showing cereal brands.
2025-08-03

Joram Mutenge

 

Checking the number of columns in your dataframe in polars

width
Sometimes your dataset may have too many columns to display on the screen at once. However, you might still want to know how many columns it contains. The Polars dataframe…
2025-08-02

Joram Mutenge

 

How to filter with empty to get true or false values in polars

is_empty
You’re conscious about your calorie intake, and you want to know whether there are any cereals in your dataset with more than 200 calories. How would you go about answering…
2025-08-01

Joram Mutenge

 

Identify and remove duplicate rows in polars

is_duplicated
Duplicates are a common problem in most datasets. That’s why it’s important to check for duplicate rows before analyzing your data. Fortunately, Polars provides an easy way…
2025-07-30

Joram Mutenge

 

Checking dataset memory usage in polars

estimated_size
It’s important to know how much memory the dataset you’re processing is consuming on your machine. Why? Because lower memory consumption leads to faster processing.…
2025-07-29

Joram Mutenge

 

Getting summary statistics in polars

describe
Data exploration is the first step in any data analysis task. You can’t start analyzing your data without understanding what it’s about. If you have numerical columns, one…
2025-07-28

Joram Mutenge

 

Thinking in tables

The mark of a 10x data analyst
In my last year of graduate school, I asked my professor what the most import skill a great data analyst should have. His response shocked me because I’d never thought about…
2025-07-26

Joram Mutenge

 

Combine dataframes with inner join in polars

inner join
Just like in SQL, it’s possible to join two dataframes in Polars. In this article, I’ll show you how to use the inner join in Polars to create a dataframe with more columns.…
2025-07-22

Joram Mutenge

 

Filter between a range of values in polars

is_between
It’s not always the case that you want to filter your data based on a single specific value. Sometimes, you may want to filter your data based on a range of values. Below is…
2025-07-21

Joram Mutenge

 

Retrieving a schema from a table in polars

collect_schema
In addition to showing column data types in the dataframe, Polars allows you to get a dictionary of data types where the key is the column name and the value is the data…
2025-07-18

Joram Mutenge

 

Counting how many times a value appears in polars

value_counts
When dealing with categorical data, you may want to know how many times each category appears in a column. Below is a dataframe containing three different categories of…
2025-07-17

Joram Mutenge

How to copy paste a table to a dataframe in polars

read_clipboard
As a data professional, you’ll work with many Excel files, and most of them will be poorly formatted. This makes it difficult to read the file into a dataframe. Fortunately…
2025-07-16

Joram Mutenge

 

Calculating cumulative sum in polars

pl.cum_sum
Say you have a target you want to hit for your weekly sales total, and you also want to know how close you are to hitting that target with each passing day. It turns out…
2025-07-15

Joram Mutenge

 

Filtering for multiple items in polars

is_in
Sometimes, you may want to filter data for more than one item. Polars makes this type of filtering straightforward. Below is a dataframe showing streaming services and their…
2025-07-14

Joram Mutenge

 

Group by calculations in polars

group_by
Aggregations to data professionals are what a saw is to a carpenter. Almost every data analysis you perform will involve an aggregate calculation, or group_by calculations…
2025-07-12

Joram Mutenge

 

Adding values across multiple columns in polars

sum_horizontal
Most dataframe libraries are excellent at performing columnar operations, such as adding or multiplying values within a single column. However, there are times when you may…
2025-07-11

Joram Mutenge

 

How to sample data with polars

sample
Analyzing a very large dataset can be strenuous on your computer, especially if you don’t have a fast machine. Instead of overworking your computer and waiting a long time…
2025-07-10

Joram Mutenge

 

Stacking dataframes vertically in polars

vstack
Combining two dataframes into a single dataframe is a common operation in data analysis. Dataframes can be combined vertically (one on top of the other) or horizontally…
2025-07-09

Joram Mutenge

Adding conditional formatting to excel workbooks with polars

write_excel
Polars allows you to save data in multiple file formats like CSV, Parquet, Avro—even Excel. What most people don’t know is that you can add special formatting to the Excel…
2025-07-07

Joram Mutenge

 

Creating a datetime column from multiple columns in polars

pl.date
When you have a column with date values, having its data type as datetime is beneficial because it allows you to perform time series analysis on your data. You can slice the…
2025-07-06

Joram Mutenge

 

Keyboard time is the key to learning how to code

Most people say they want to learn how to code, but if you take a look at how they spend their time acquiring that skill, it makes you question whether they will ever become…
2025-07-04

Joram Mutenge

 

How to convert from pandas to polars dataframe

pl.from_pandas
Polars is highly versatile. It can accept a Pandas dataframe and convert it into a Polars dataframe. This conversion is especially useful when working with HTML data, as…
2025-07-03

Joram Mutenge

 

Maintaining a column name without retyping it in polars

name.keep
DataFrames make it easy to create new columns and rename existing ones. That’s why I prefer doing data analysis with DataFrames rather than SQL tables. Say we have a…
2025-07-02

Joram Mutenge

 

Joining text data with polars

pl.concat_str
Most people have a love-hate relationship with text data, but if you work in the data field, you’re bound to encounter it.
2025-07-01

Joram Mutenge

 

Getting every other row of the dataframe with polars

gather_every
There comes a time when you want to extract every other row from your DataFrame. What I mean is: get the first row, skip the second, get the third, skip the fourth, and so on.
2025-06-30

Joram Mutenge

 

How to know the number of days in each month with polars

dt.month_end
If I asked you, “How many days are in each month of the calendar?” most people would be able to answer for a few months, but not all.
2025-06-29

Joram Mutenge

Why aren’t more data people talking about ibis?

If you started working in the data field 20 years ago, you probably used a lot of SQL. It’s a robust, 50-year-old technology that excels at querying data, thanks to decades…
2025-06-25

Joram Mutenge

 

Converting unix timestamp to polars datetime

pl.from_epoch
Computers are good at recording timestamps, but they do it in Unix time. Sadly, humans aren’t great at interpreting Unix time.
2025-06-24

Joram Mutenge

 

Adding a currency symbol to polars dataframe values

pl.format
When your dataframe contains monetary values such as budgets it’s helpful to include a currency symbol. This ensures your audience clearly understands whether the figures…
2025-06-23

Joram Mutenge

 

Turning a polars dataframe to long format with explode

explode
Tabular data is easier to process when columns contain singular values of the same data type for each row. However, sometimes data can be stored as a list in a single row.
2025-06-20

Joram Mutenge

 

Arranging columns in a specific order using index in polars

pl.nth
Stacking dataframes vertically only works when the column names are the same and arranged in the same order. For example, if you have df1 with columns Name and Age, and df2 w…
2025-06-19

Joram Mutenge

 

How to remove whitespace in column values using polars in python

strip_chars
In data science or data analysis, counting unique values is very common. However, having whitespace (empty space at the beginning or at the end) in your values can lead to…
2025-06-18

Joram Mutenge

 

Your beautiful code doesn’t matter anymore (and that’s fine)

Your code doesn’t run in a vacuum. Your tools, your projects, your career all depend on a fast-moving ecosystem of languages, libraries, platforms, datasets, and now, AI.
2025-06-12

Joram Mutenge

 

Most data analysts are stuck using mediocre tools thanks to the familiarity trap

Data analysts aren’t exactly known for their technical wizardry – at least not if “technical” means writing actual code that doesn’t make software engineers weep. In fact…
2025-06-01

Joram Mutenge

 

Idea person or Thoughtful person: Which one are you?

If you’re working in an organization, you can’t escape meetings. They are part and parcel of day-to-day life in any workplace. But sometimes, it can feel like all you ever…
2025-05-25

Joram Mutenge

 

The single most important lesson I learned from my retired boss

The day my boss retired, a small part of me died.
2025-05-18

Joram Mutenge

 

Let me be the new host of The Data Scientist Show

I messaged Daliana Liu, host of The Data Scientist Show, on LinkedIn asking to become the new host. If you know her, please share this post so she sees it.
2025-04-26

Joram Mutenge

I made my first pull request to the Marimo team

If you work extensively with data in Python, you’ll agree that Jupyter notebooks provide an excellent environment for data analysis. I’ve used Jupyter notebooks for a long…
2025-03-03

Joram Mutenge

How to improve a bad graph with plotly

All data visualizations should, first and foremost, inform. Any visualization that falls short of this is simply data art. Data visualizations that are uninformative may be…
2025-02-25

Joram Mutenge

Creating a desktop app using kivy in python

I love watching movies and TV shows, but there are just too many out there that it’s often difficult to pick what to watch. For years, I’ve been updating my database of…
2025-02-22

Joram Mutenge

Using set theory to speed up your data analysis

Most data analysis tasks involve joining tables to get more data or filter out specific data. But what happens when the data you’re working with isn’t in a format that…
2025-02-09

Joram Mutenge

 

How overlooking a small detail on a job interview can disqualify you for a position

A month ago, I interviewed a candidate for a junior data analyst position. Given how difficult it is to land an interview in the data field due to stiff competition, I…
2025-02-01

Joram Mutenge

Ten polars functions that pros use and amateurs don’t

Polars is increasingly becoming a popular data analysis library, and my prediction is that more new data scientists and analysts will be starting with Polars rather than…
2025-01-13

Joram Mutenge

Tranforming timeseries data with group by and group by dynamic in polars

Polars has become my go-to library for data analysis. Each client project brings new insights into the powerful functionality Polars offers. Recently, I worked on a project…
2025-01-06

Joram Mutenge

 

What I learned about group by dynamic in polars while working on a client’s project

In the last client project I worked on, I learned something about the group_by_dynamic function in Polars. While what I learned was surprising, the fact that I learned it…
2024-12-30

Joram Mutenge

How to create charts from The Economist magazine using plotly

We at Conterval have always been fans of the charts from The Economist magazine. No publication does a better job of creating static visualizations you can use in print. We…
2024-12-15

Joram Mutenge

What tool should you use as a data analyst?

Data analysis is a hot field nowadays. Companies are opening up new data analyst positions, and many people want to become data analysts.
2024-12-01

Joram Mutenge

How we helped a bakery generate forecast by bread type using polars

A few weeks ago, Conterval did a consulting gig for a medium-sized bakery. This bakery makes white and brown bread, which it sells to a major retail store here in the USA.…
2024-11-18

Joram Mutenge

No matching items
     

    © 2025 Conterval · Contact