Blog
Book
Categories
All
(60)
100DaysOfPolars
(41)
data analysis
(2)
visualization
(2)
Collapsing row values into a list in polars
to_list
Python is big on lists because they are one of the fundamental data structures of the language. It’s no surprise that data analysis tasks in Python almost always involve…
2025-08-16
Joram Mutenge
Get top values without sorting in polars
top_k
Sorting values is an expensive operation, especially when you have a large dataset. However, sometimes you have no choice but to sort the data, particularly when you need to…
2025-08-15
Joram Mutenge
Collect text data into a list in polars
implode
Long format data (data with many rows) is good for performing analyses but bad for presentation. If you want people to easily understand your data, you should strive to…
2025-08-14
Joram Mutenge
Using quantile to filter data in polars
quantile
If you’ve ever taken the SAT, ACT, or GMAT, you might have wondered what percentile you scored in. A percentile is a way of describing a score’s position in a dataset…
2025-08-13
Joram Mutenge
Get unique values in a column in polars
unique
If one value in a column is overrepresented, you might think it’s the only value present since Polars displays only the first five and last five rows of the dataframe. To…
2025-08-12
Joram Mutenge
Using contains to filter data in polars
contains
The dirty secret is that data work is mainly about working with text data rather than numerical data. That’s why knowing how to filter text data is a valuable skill to have.…
2025-08-11
Joram Mutenge
Turn a polars series to a dataframe
to_frame
In Polars, a series is a one-column table, while a dataframe is a multi-column table. However, it’s possible to convert a series into a dataframe. Below is a series of cities.
2025-08-09
Joram Mutenge
When you want to count values without null in polars
pl.count
Most of the time, when you’re counting the number of values in a column, you’re not interested in including empty (null) values. Therefore, simply counting the number of…
2025-08-08
Joram Mutenge
Using polars selectors to filter dataframes
polars.selectors
Selecting specific columns to filter out unnecessary columns is common in data analysis. Polars makes this type of filtering straightforward through the use of selectors.…
2025-08-07
Joram Mutenge
Turn a matrix of data to a polars dataframe
from_records
In data science and analysis, storing data in an array or nested arrays (a matrix) is a common practice. Unfortunately, performing operations directly on arrays can be…
2025-08-06
Joram Mutenge
Get middle rows from a polars dataframe
slice
We all know how to get the first or last five rows of a dataset. But did you know you can also retrieve the middle rows? Below is a dataframe showing cereal brands:
2025-08-05
Joram Mutenge
Display thousand values with a comma separator in polars
pl.Config.set_thousands_separator
It’s easier to read large numerical values when they’re separated by a comma every three digits. For instance, your brain can more easily comprehend
5,000,000
at a glance…
2025-08-04
Joram Mutenge
Take a glimpse into your data with polars
glimpse
I’ve said it before, and I’ll say it again: it’s important to know the data you’re working with before you start analyzing it. Below is a dataframe showing cereal brands.
2025-08-03
Joram Mutenge
Checking the number of columns in your dataframe in polars
width
Sometimes your dataset may have too many columns to display on the screen at once. However, you might still want to know how many columns it contains. The Polars dataframe…
2025-08-02
Joram Mutenge
How to filter with empty to get true or false values in polars
is_empty
You’re conscious about your calorie intake, and you want to know whether there are any cereals in your dataset with more than 200 calories. How would you go about answering…
2025-08-01
Joram Mutenge
Identify and remove duplicate rows in polars
is_duplicated
Duplicates are a common problem in most datasets. That’s why it’s important to check for duplicate rows before analyzing your data. Fortunately, Polars provides an easy way…
2025-07-30
Joram Mutenge
Checking dataset memory usage in polars
estimated_size
It’s important to know how much memory the dataset you’re processing is consuming on your machine. Why? Because lower memory consumption leads to faster processing.…
2025-07-29
Joram Mutenge
Getting summary statistics in polars
describe
Data exploration is the first step in any data analysis task. You can’t start analyzing your data without understanding what it’s about. If you have numerical columns, one…
2025-07-28
Joram Mutenge
Thinking in tables
The mark of a 10x data analyst
In my last year of graduate school, I asked my professor what the most import skill a great data analyst should have. His response shocked me because I’d never thought about…
2025-07-26
Joram Mutenge
Combine dataframes with inner join in polars
inner join
Just like in SQL, it’s possible to join two dataframes in Polars. In this article, I’ll show you how to use the
inner
join in Polars to create a dataframe with more columns.…
2025-07-22
Joram Mutenge
Filter between a range of values in polars
is_between
It’s not always the case that you want to filter your data based on a single specific value. Sometimes, you may want to filter your data based on a range of values. Below is…
2025-07-21
Joram Mutenge
Retrieving a schema from a table in polars
collect_schema
In addition to showing column data types in the dataframe, Polars allows you to get a dictionary of data types where the key is the column name and the value is the data…
2025-07-18
Joram Mutenge
Counting how many times a value appears in polars
value_counts
When dealing with categorical data, you may want to know how many times each category appears in a column. Below is a dataframe containing three different categories of…
2025-07-17
Joram Mutenge
How to copy paste a table to a dataframe in polars
read_clipboard
As a data professional, you’ll work with many Excel files, and most of them will be poorly formatted. This makes it difficult to read the file into a dataframe. Fortunately…
2025-07-16
Joram Mutenge
Calculating cumulative sum in polars
pl.cum_sum
Say you have a target you want to hit for your weekly sales total, and you also want to know how close you are to hitting that target with each passing day. It turns out…
2025-07-15
Joram Mutenge
Filtering for multiple items in polars
is_in
Sometimes, you may want to filter data for more than one item. Polars makes this type of filtering straightforward. Below is a dataframe showing streaming services and their…
2025-07-14
Joram Mutenge
Group by calculations in polars
group_by
Aggregations to data professionals are what a saw is to a carpenter. Almost every data analysis you perform will involve an aggregate calculation, or
group_by
calculations…
2025-07-12
Joram Mutenge
Adding values across multiple columns in polars
sum_horizontal
Most dataframe libraries are excellent at performing columnar operations, such as adding or multiplying values within a single column. However, there are times when you may…
2025-07-11
Joram Mutenge
How to sample data with polars
sample
Analyzing a very large dataset can be strenuous on your computer, especially if you don’t have a fast machine. Instead of overworking your computer and waiting a long time…
2025-07-10
Joram Mutenge
Stacking dataframes vertically in polars
vstack
Combining two dataframes into a single dataframe is a common operation in data analysis. Dataframes can be combined vertically (one on top of the other) or horizontally…
2025-07-09
Joram Mutenge
Adding conditional formatting to excel workbooks with polars
write_excel
Polars allows you to save data in multiple file formats like CSV, Parquet, Avro—even Excel. What most people don’t know is that you can add special formatting to the Excel…
2025-07-07
Joram Mutenge
Creating a datetime column from multiple columns in polars
pl.date
When you have a column with date values, having its data type as
datetime
is beneficial because it allows you to perform time series analysis on your data. You can slice the…
2025-07-06
Joram Mutenge
Keyboard time is the key to learning how to code
Most people say they want to learn how to code, but if you take a look at how they spend their time acquiring that skill, it makes you question whether they will ever become…
2025-07-04
Joram Mutenge
How to convert from pandas to polars dataframe
pl.from_pandas
Polars is highly versatile. It can accept a Pandas dataframe and convert it into a Polars dataframe. This conversion is especially useful when working with HTML data, as…
2025-07-03
Joram Mutenge
Maintaining a column name without retyping it in polars
name.keep
DataFrames make it easy to create new columns and rename existing ones. That’s why I prefer doing data analysis with DataFrames rather than SQL tables. Say we have a…
2025-07-02
Joram Mutenge
Joining text data with polars
pl.concat_str
Most people have a love-hate relationship with text data, but if you work in the data field, you’re bound to encounter it.
2025-07-01
Joram Mutenge
Getting every other row of the dataframe with polars
gather_every
There comes a time when you want to extract every other row from your DataFrame. What I mean is: get the first row, skip the second, get the third, skip the fourth, and so on.
2025-06-30
Joram Mutenge
How to know the number of days in each month with polars
dt.month_end
If I asked you, “How many days are in each month of the calendar?” most people would be able to answer for a few months, but not all.
2025-06-29
Joram Mutenge
Why aren’t more data people talking about ibis?
If you started working in the data field 20 years ago, you probably used a lot of SQL. It’s a robust, 50-year-old technology that excels at querying data, thanks to decades…
2025-06-25
Joram Mutenge
Converting unix timestamp to polars datetime
pl.from_epoch
Computers are good at recording timestamps, but they do it in Unix time. Sadly, humans aren’t great at interpreting Unix time.
2025-06-24
Joram Mutenge
Adding a currency symbol to polars dataframe values
pl.format
When your dataframe contains monetary values such as budgets it’s helpful to include a currency symbol. This ensures your audience clearly understands whether the figures…
2025-06-23
Joram Mutenge
Turning a polars dataframe to long format with explode
explode
Tabular data is easier to process when columns contain singular values of the same data type for each row. However, sometimes data can be stored as a list in a single row.
2025-06-20
Joram Mutenge
Arranging columns in a specific order using index in polars
pl.nth
Stacking dataframes vertically only works when the column names are the same and arranged in the same order. For example, if you have
df1
with columns
Name
and
Age
, and
df2
w…
2025-06-19
Joram Mutenge
How to remove whitespace in column values using polars in python
strip_chars
In data science or data analysis, counting unique values is very common. However, having whitespace (empty space at the beginning or at the end) in your values can lead to…
2025-06-18
Joram Mutenge
Your beautiful code doesn’t matter anymore (and that’s fine)
Your code doesn’t run in a vacuum. Your tools, your projects, your career all depend on a fast-moving ecosystem of languages, libraries, platforms, datasets, and now, AI.
2025-06-12
Joram Mutenge
Most data analysts are stuck using mediocre tools thanks to the familiarity trap
Data analysts aren’t exactly known for their technical wizardry – at least not if “technical” means writing actual code that doesn’t make software engineers weep. In fact…
2025-06-01
Joram Mutenge
Idea person or Thoughtful person: Which one are you?
If you’re working in an organization, you can’t escape meetings. They are part and parcel of day-to-day life in any workplace. But sometimes, it can feel like all you ever…
2025-05-25
Joram Mutenge
The single most important lesson I learned from my retired boss
The day my boss retired, a small part of me died.
2025-05-18
Joram Mutenge
Let me be the new host of The Data Scientist Show
I messaged Daliana Liu, host of
The Data Scientist Show
, on LinkedIn asking to become the new host. If you know her, please share this post so she sees it.
2025-04-26
Joram Mutenge
I made my first pull request to the Marimo team
If you work extensively with data in Python, you’ll agree that Jupyter notebooks provide an excellent environment for data analysis. I’ve used Jupyter notebooks for a long…
2025-03-03
Joram Mutenge
How to improve a bad graph with plotly
All data visualizations should, first and foremost, inform. Any visualization that falls short of this is simply data art. Data visualizations that are uninformative may be…
2025-02-25
Joram Mutenge
Creating a desktop app using kivy in python
I love watching movies and TV shows, but there are just too many out there that it’s often difficult to pick what to watch. For years, I’ve been updating my database of…
2025-02-22
Joram Mutenge
Using set theory to speed up your data analysis
Most data analysis tasks involve joining tables to get more data or filter out specific data. But what happens when the data you’re working with isn’t in a format that…
2025-02-09
Joram Mutenge
How overlooking a small detail on a job interview can disqualify you for a position
A month ago, I interviewed a candidate for a junior data analyst position. Given how difficult it is to land an interview in the data field due to stiff competition, I…
2025-02-01
Joram Mutenge
Ten polars functions that pros use and amateurs don’t
Polars is increasingly becoming a popular data analysis library, and my prediction is that more new data scientists and analysts will be starting with Polars rather than…
2025-01-13
Joram Mutenge
Tranforming timeseries data with group by and group by dynamic in polars
Polars has become my go-to library for data analysis. Each client project brings new insights into the powerful functionality Polars offers. Recently, I worked on a project…
2025-01-06
Joram Mutenge
What I learned about group by dynamic in polars while working on a client’s project
In the last client project I worked on, I learned something about the
group_by_dynamic
function in Polars. While what I learned was surprising, the fact that I learned it…
2024-12-30
Joram Mutenge
How to create charts from The Economist magazine using plotly
We at Conterval have always been fans of the charts from
The Economist
magazine. No publication does a better job of creating static visualizations you can use in print. We…
2024-12-15
Joram Mutenge
What tool should you use as a data analyst?
Data analysis is a hot field nowadays. Companies are opening up new data analyst positions, and many people want to become data analysts.
2024-12-01
Joram Mutenge
How we helped a bakery generate forecast by bread type using polars
A few weeks ago, Conterval did a consulting gig for a medium-sized bakery. This bakery makes white and brown bread, which it sells to a major retail store here in the USA.…
2024-11-18
Joram Mutenge
No matching items