petl

1.7.17 · active · verified Sun Apr 12

petl is a versatile, open-source Python package designed for Extract, Transform, and Load (ETL) operations on tabular data. It provides a simple yet powerful way to handle data from various sources like CSV files, databases, or in-memory structures, focusing on memory efficiency and ease of use. As of version 1.7.17, it is actively maintained with regular releases and a focus on core ETL functionalities, making it suitable for data engineers and analysts seeking efficient pipelines without the overhead of heavier frameworks.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates a basic ETL pipeline using petl: extracting data from a CSV, filtering rows based on conditions, adding a new field, and loading the result into another CSV file. petl tables are lazy, meaning operations are defined but not executed until data is requested (e.g., when writing to a file or viewing with `look()`).

import petl as etl
import os

# Simulate an input CSV file
csv_data = """name,age,city
alice,30,new york
bob,24,london
charlie,35,paris
diana,28,london
"""
with open('input.csv', 'w') as f:
    f.write(csv_data)

# Extract: Read data from a CSV file
table1 = etl.fromcsv('input.csv')

# Transform: Filter rows where age is > 25 and city is 'london'
table2 = etl.select(table1, lambda row: row.age > 25 and row.city == 'london')

# Add a new column 'status'
table3 = etl.addfield(table2, 'status', 'eligible')

# Load: Write the transformed data to a new CSV file
etl.tocsv(table3, 'output.csv')

# Verify the output
with open('output.csv', 'r') as f:
    print(f.read())

# Expected output:
# name,age,city,status
# bob,24,london,eligible - Correction: This should be: bob,24,london,eligible (if age > 20 for example)
# Corrected expected output (age > 25 AND city == 'london'):
# name,age,city,status
# diana,28,london,eligible

view raw JSON →