shape: (4, 1)
| YouTube_Comment |
|---|
| str |
| "๐ค I don't think this works IRLโฆ |
| "๐๐" |
| "๐ฅ" |
| "Wow!" |
len_bytes
Joram Mutenge
2025-11-11
The number of characters in text is not always the same as the number of bytes, especially when the text contains emojis. Below is a dataframe showing YouTube comments.
| YouTube_Comment |
|---|
| str |
| "๐ค I don't think this works IRLโฆ |
| "๐๐" |
| "๐ฅ" |
| "Wow!" |
To count the number of characters and bytes in each comment, use len_chars and len_bytes respectively. The code below shows how:
| YouTube_Comment | Chars | Bytes |
|---|---|---|
| str | u32 | u32 |
| "๐ค I don't think this works IRLโฆ | 31 | 34 |
| "๐๐" | 2 | 8 |
| "๐ฅ" | 1 | 4 |
| "Wow!" | 4 | 4 |
An emoji with one character can have many bytes.
Check out my Polars course to learn more.