Polars
High-performance DataFrame library written in Rust. Zero required dependencies. Releases ~bi-weekly with breaking releases every ~6 months following SemVer. Current version is 1.39.3 (Mar 2026). Primary footgun: Polars has no row index — pandas muscle-memory causes constant errors.
Warnings
- breaking No row index. Polars has no implicit integer index like pandas. df.iloc[0] does not exist. Operations that depend on row position use row_index() or slice().
- breaking DataFrames are immutable. Item assignment df['col'] = ... raises TypeError. This is the most common error for pandas users.
- breaking groupby renamed to group_by in 0.19. Extremely common in LLM-generated code — groupby is in the vast majority of tutorials prior to 2024.
- breaking apply() renamed to map_elements() for element-wise UDFs, and map_batches() for Series-level UDFs. apply was removed in 1.0.
- breaking pl.count() deprecated and removed — used to count all rows including nulls. Replaced by pl.len().
- breaking df.write_json() changed in 1.0: now only writes row-oriented JSON. The row_oriented and pretty parameters removed. Old JSON format (column-oriented) was used by read_json() — now use serialize()/deserialize() for that.
- breaking LazyFrame.schema, .columns, .dtypes are now properties that emit PerformanceWarning — accessing them triggers schema resolution which can be expensive in complex lazy pipelines.
- gotcha Lazy execution does not run until .collect() is called. Forgetting .collect() returns a LazyFrame, not a DataFrame — printing it shows the query plan, not data.
- gotcha map_elements() (formerly apply) is always slow — it breaks out of Rust into Python for each element. For any operation with a native Polars expression, use the expression API instead.
Install
-
pip install polars -
pip install polars[all] -
pip install polars[rt64] -
pip install polars[rtcompat]
Imports
- polars
import polars as pl df = pl.DataFrame({'a': [1, 2, 3], 'b': ['x', 'y', 'z']}) result = df.filter(pl.col('a') > 1).select(['a', 'b']) - group_by
df.group_by('col').agg(pl.col('val').sum())
Quickstart
import polars as pl
# Eager execution
df = pl.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'score': [85, 92, 78],
'dept': ['eng', 'eng', 'mkt']
})
result = (
df
.filter(pl.col('score') > 80)
.group_by('dept')
.agg(pl.col('score').mean().alias('avg_score'))
.sort('avg_score', descending=True)
)
# Lazy execution (preferred for large data)
lazy_result = (
pl.scan_csv('data.csv')
.filter(pl.col('score') > 80)
.group_by('dept')
.agg(pl.col('score').mean())
.collect() # execute
)