{"id":4682,"library":"petl","title":"petl","description":"petl is a versatile, open-source Python package designed for Extract, Transform, and Load (ETL) operations on tabular data. It provides a simple yet powerful way to handle data from various sources like CSV files, databases, or in-memory structures, focusing on memory efficiency and ease of use. As of version 1.7.17, it is actively maintained with regular releases and a focus on core ETL functionalities, making it suitable for data engineers and analysts seeking efficient pipelines without the overhead of heavier frameworks.","status":"active","version":"1.7.17","language":"en","source_language":"en","source_url":"https://github.com/petl-developers/petl","tags":["ETL","data transformation","tabular data","data processing","data cleaning","lazy evaluation"],"install":[{"cmd":"pip install petl","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Companion package for domain-specific and experimental extensions.","package":"petlx","optional":true,"install_cmd":"pip install petlx"},{"reason":"For interacting with relational databases.","package":"SQLAlchemy","optional":true,"install_cmd":"pip install petl['db'] psycopg2-binary"},{"reason":"For reading and writing .xlsx Excel files.","package":"openpyxl","optional":true,"install_cmd":"pip install petl['xlsx']"},{"reason":"For reading .xls Excel files (legacy format).","package":"xlrd","optional":true,"install_cmd":"pip install petl['xls']"},{"reason":"For writing .xls Excel files (legacy format).","package":"xlwt","optional":true,"install_cmd":"pip install petl['xls']"},{"reason":"For reading and writing from remote and cloud filesystems.","package":"fsspec","optional":true,"install_cmd":"pip install petl['remote']"},{"reason":"For interoperability with pandas DataFrames.","package":"pandas","optional":true,"install_cmd":"pip install petl['pandas']"},{"reason":"For interoperability with NumPy arrays.","package":"numpy","optional":true,"install_cmd":"pip install petl['numpy']"}],"imports":[{"note":"The convention is to import petl with the alias `etl` for brevity and clarity in ETL operations.","symbol":"petl","correct":"import petl as etl"}],"quickstart":{"code":"import petl as etl\nimport os\n\n# Simulate an input CSV file\ncsv_data = \"\"\"name,age,city\nalice,30,new york\nbob,24,london\ncharlie,35,paris\ndiana,28,london\n\"\"\"\nwith open('input.csv', 'w') as f:\n    f.write(csv_data)\n\n# Extract: Read data from a CSV file\ntable1 = etl.fromcsv('input.csv')\n\n# Transform: Filter rows where age is > 25 and city is 'london'\ntable2 = etl.select(table1, lambda row: row.age > 25 and row.city == 'london')\n\n# Add a new column 'status'\ntable3 = etl.addfield(table2, 'status', 'eligible')\n\n# Load: Write the transformed data to a new CSV file\netl.tocsv(table3, 'output.csv')\n\n# Verify the output\nwith open('output.csv', 'r') as f:\n    print(f.read())\n\n# Expected output:\n# name,age,city,status\n# bob,24,london,eligible - Correction: This should be: bob,24,london,eligible (if age > 20 for example)\n# Corrected expected output (age > 25 AND city == 'london'):\n# name,age,city,status\n# diana,28,london,eligible\n","lang":"python","description":"This quickstart demonstrates a basic ETL pipeline using petl: extracting data from a CSV, filtering rows based on conditions, adding a new field, and loading the result into another CSV file. petl tables are lazy, meaning operations are defined but not executed until data is requested (e.g., when writing to a file or viewing with `look()`)."},"warnings":[{"fix":"Ensure your Python environment is 3.6 or newer before upgrading to petl 2.0. Update any Python 2.7 specific syntax.","message":"Version 2.0 of petl will drop support for Python 2.7, with the minimum supported Python version becoming 3.6. Code running on older Python environments will require an upgrade.","severity":"breaking","affected_versions":"All versions prior to 2.0.x when migrating to Python 3.6+"},{"fix":"Replace explicit `raise StopIteration` with a simple `return` statement in your custom generator functions. Alternatively, catch `StopIteration` if using `next()` calls within your generator.","message":"When creating custom generator functions within a petl pipeline, directly raising `StopIteration` to signal exhaustion will result in a `RuntimeError` in Python 3.7 and later, due to PEP 479. petl itself has addressed this internally in `v1.7.8` and `v1.7.14`.","severity":"gotcha","affected_versions":"Python 3.7+ (not petl-specific, but applies to user-defined generators)"},{"fix":"Be mindful of operations that require materializing the entire table. For large datasets, consider chunking, using `lines=True` for JSON output, or exploring `petlx` extensions or alternative packages (e.g., Dask, PyTables) for memory-intensive tasks if petl's built-in options are insufficient.","message":"While petl emphasizes lazy evaluation, certain operations like `sort()`, `tojson()` (without `lines=True`), `look()`, or `see()` can load entire tables into memory, potentially leading to high memory consumption for very large datasets.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to petl `v1.7.16` or newer to benefit from the fix for joining tables with uneven rows.","message":"Prior to `v1.7.16`, joining tables with uneven rows could produce incorrect results. Users performing join operations on such tables with older versions might experience data integrity issues.","severity":"gotcha","affected_versions":"<1.7.16"},{"fix":"Always provide an explicit header argument to `etl.fromdicts()` for predictable column ordering, e.g., `etl.fromdicts(dicts, header=['field1', 'field2'])`. Alternatively, use `etl.sortheader()` on the resulting table.","message":"When using `etl.fromdicts()` without explicitly providing a header, the order of fields inferred from sampling the input dictionaries might not be stable or consistent across runs.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Explicitly specify an output file for `to*()` functions (e.g., `etl.tocsv(table, 'output.csv')`) to avoid unintended output to `stdout`.","message":"In `v1.7.12`, a fix clarified that `to*()` functions (e.g., `tocsv()`, `tojson()`) should output to `stdout` by default if no specific output file/source is provided. Code on older versions or code expecting a different default behavior might need adjustment.","severity":"gotcha","affected_versions":"<1.7.12"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}