Polars (LTS CPU version)
Polars is a blazingly fast DataFrame library for Python, implemented in Rust, designed for performance-critical data manipulation. The `polars-lts-cpu` package specifically provides a long-term support (LTS), CPU-only build of Polars, ensuring stability and a smaller installation footprint without GPU dependencies. It generally follows a slower release cadence than the main `polars` package, focusing on reliability. The current version is 1.33.1.
Warnings
- gotcha The `polars-lts-cpu` package is distinct from the main `polars` package. `polars-lts-cpu` provides a CPU-only, long-term support version that often trails the latest features and versions of the main `polars` package. Installing the wrong package can lead to unexpected features, performance, or dependencies.
- breaking Polars strongly differentiates between eager (DataFrame) and lazy (LazyFrame) execution. Operations on a `DataFrame` are executed immediately, while `LazyFrame` operations are chained and only computed upon calling `.collect()`. Mixing these paradigms or expecting lazy behavior from an eager DataFrame is a common source of error.
- gotcha Polars is highly optimized to avoid unnecessary data copies, leading to 'view' semantics for many operations. This can surprise users accustomed to libraries like Pandas, where operations often create implicit copies. Modifying data derived from a view might inadvertently affect the original data if not handled carefully.
Install
-
pip install polars-lts-cpu
Imports
- polars
import polars as pl
- DataFrame
pl.DataFrame
- LazyFrame
pl.LazyFrame
- Series
pl.Series
- col
pl.col
- lit
pl.lit
Quickstart
import polars as pl
# Create a DataFrame
df = pl.DataFrame(
{
"name": ["Alice", "Bob", "Charlie", "David", "Eve"],
"age": [25, 30, 35, 28, 22],
"city": ["New York", "London", "Paris", "New York", "London"],
"score": [90, 85, 92, 78, 95],
}
)
# Perform some operations: filter and group by city, then calculate average score
result = (
df.filter(pl.col("age") > 25)
.group_by("city")
.agg(
pl.col("score").mean().alias("average_score"),
pl.col("name").count().alias("num_people"),
)
.sort("average_score", descending=True)
)
print("Original DataFrame:\n", df)
print("\nProcessed Result:\n", result)