Agate Data Analysis Library
Agate is a Python data analysis library that is optimized for humans instead of machines. It is presented as an alternative to numpy and pandas, designed to solve real-world problems with readable code. It is currently at version 1.14.2 and has a steady release cadence, actively maintained by the wireservice team.
Warnings
- breaking Agate has dropped official support for Python 2.x. Users must ensure they are running Python 3.5 or newer. PyPI listings indicate active testing and support for Python 3.10-3.14.
- gotcha Agate's core design principle dictates that `Table` objects are immutable. Operations like `select()`, `where()`, or `order_by()` do not modify the original table in-place; instead, they return *new* `Table` instances.
- gotcha When loading data from sources like CSV, `agate` uses a `TypeTester` to automatically infer column data types. While generally effective, it can sometimes guess incorrectly, especially with ambiguous data.
- gotcha There are multiple Python libraries with 'Agate' in their name. This entry refers to `wireservice/agate`, a data analysis library (`pip install agate`). Another common one is `obiba-agate`, which is a client for an 'Agate server' and has different use cases and dependencies.
Install
-
pip install agate -
pip install agate[icu]
Imports
- agate
import agate
- Table
import agate table = agate.Table(...)
- TypeTester
from agate import TypeTester
- Sum
from agate.aggregations import Sum
Quickstart
import agate
import csv
import io
# Create a dummy CSV in memory for a runnable example
csv_data = """name,age,city,salary
Alice,30,New York,70000
Bob,24,London,50000
Charlie,30,New York,75000
David,35,London,90000
Eve,24,Paris,60000
"""
# Use io.StringIO to simulate a file for from_csv
with io.StringIO(csv_data) as f:
# agate automatically infers types with TypeTester by default
table = agate.Table.from_csv(f)
print("Original Table:")
table.print_table()
# Filter rows where age is less than 30
filtered_table = table.where(lambda row: row['age'] < 30)
print("\nFiltered Table (age < 30):")
filtered_table.print_table()
# Group by city and calculate average salary
from agate.aggregations import Sum, Mean
by_city = table.group_by('city')
averages = by_city.aggregate([
('average_salary', Mean('salary')),
('total_employees', Sum('age', cast=True)) # Using Sum on age as a proxy for count
])
print("\nAggregated by City (Average Salary & Total Employees):")
averages.print_table()
# Order by average salary, descending
ordered_averages = averages.order_by('average_salary', reverse=True)
print("\nOrdered by Average Salary (Descending):")
ordered_averages.print_table()