csvkit
csvkit is a suite of powerful command-line tools for working with CSV files. It enables users to convert, clean, analyze, and process tabular data efficiently from the command line. The current version is 2.2.0. Its release cadence is moderate, with major versions often representing significant architectural changes.
Warnings
- breaking Version 2.0.0 of csvkit dropped support for Python 2, making it a Python 3-only library. Running on Python 2 will result in `ImportError` or other runtime errors.
- gotcha CSV encoding is a frequent source of issues. `csvkit` defaults to UTF-8, but many CSVs use other encodings (e.g., Latin-1, cp1252).
- gotcha `csvkit` tools can consume significant memory when processing very large CSV files, as they often load the entire dataset into memory.
- gotcha Automatic header detection can sometimes fail on malformed or unusual CSVs, leading to data rows being interpreted as headers or vice versa. This is especially true for files without explicit headers.
Install
-
pip install csvkit
Imports
- agate
import agate
Quickstart
import subprocess
import os
# Create a dummy CSV file
csv_content = "name,age\nAlice,30\nBob,24\nCharlie,35"
file_path = "example.csv"
with open(file_path, "w") as f:
f.write(csv_content)
print("Original CSV content:")
print(csv_content)
# Use csvlook to pretty-print (run as a subprocess)
print("\n--- Output from csvlook ---")
try:
result = subprocess.run(['csvlook', file_path], capture_output=True, text=True, check=True)
print(result.stdout)
except subprocess.CalledProcessError as e:
print(f"Error running csvlook: {e.stderr}")
# Use csvsql to query data (run as a subprocess)
print("\n--- Output from csvsql (names of people over 25) ---")
try:
result = subprocess.run(
['csvsql', '--query', 'SELECT name FROM example WHERE age > 25', file_path],
capture_output=True, text=True, check=True
)
print(result.stdout)
except subprocess.CalledProcessError as e:
print(f"Error running csvsql: {e.stderr}")
# Clean up the dummy file
os.remove(file_path)
print(f"\nCleaned up {file_path}")