CleverCSV
CleverCSV is a Python package designed for robustly handling messy CSV files. It provides a drop-in replacement for the standard Python `csv` module, enhancing dialect detection capabilities to accurately parse files that often cause issues. It also includes a command-line interface for tasks like standardization and code generation. The library maintains an active development status, with several minor releases typically occurring each year.
Warnings
- breaking The minimum required Python version has been bumped. Version `0.8.1` required Python `>=3.8`, and the latest `0.8.4` requires Python `>=3.9`.
- breaking The internal `ConsistencyDetector` functionality was redesigned in `v0.8.0` from a direct function to a class. Direct calls to the old function signature will fail.
- gotcha The `clevercsv explore` command-line tool and other advanced features like `read_dataframe` (which uses Pandas) rely on optional dependencies. If you install `clevercsv` without specifying `[full]`, these features might be unavailable or raise import errors.
Install
-
pip install clevercsv -
pip install clevercsv[full]
Imports
- clevercsv
import clevercsv
- reader
import csv reader = csv.reader(file, dialect)
import clevercsv reader = clevercsv.reader(file, dialect)
- read_table
import clevercsv table = clevercsv.read_table('my_file.csv') - read_dataframe
import clevercsv df = clevercsv.read_dataframe('my_file.csv')
Quickstart
import clevercsv
import os
# Create a dummy messy CSV file for demonstration
csv_content = 'col1;col2;col3\nvalue1;"value,2";value3\n4;5;6\n'
file_path = 'messy_data.csv'
with open(file_path, 'w', newline='', encoding='utf-8') as f:
f.write(csv_content)
try:
# Use read_table to automatically detect the dialect and load the data
rows = clevercsv.read_table(file_path)
print(f"Loaded {len(rows)} rows with detected dialect:")
for row in rows:
print(row)
# Demonstrate drop-in replacement for standard csv module usage
with open(file_path, 'r', newline='') as csvfile:
# Sniff the dialect using CleverCSV's improved sniffer
dialect = clevercsv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = clevercsv.reader(csvfile, dialect)
sniffer_rows = list(reader)
print(f"\nLoaded {len(sniffer_rows)} rows using CleverCSV.Sniffer:")
for row in sniffer_rows:
print(row)
finally:
# Clean up the dummy file
if os.path.exists(file_path):
os.remove(file_path)