CSV Diff

1.2 · active · verified Sun Apr 12

csv-diff is a Python CLI tool and library for efficiently comparing the semantic contents of two CSV, TSV, or JSON files. It identifies added, removed, and changed rows based on a specified key, ignoring cosmetic differences like row and column ordering. The library is actively maintained with regular updates addressing features and bug fixes, with its current version being 1.2.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `csv-diff` programmatically to compare two in-memory CSV datasets. It loads the data using `load_csv`, specifying 'id' as the unique key, and then uses `compare` to generate a dictionary detailing added, removed, and changed rows and columns.

import io
from csv_diff import load_csv, compare

# Simulate two CSV files as in-memory strings
csv1_data = """id,name,age
1,Alice,30
2,Bob,24
3,Charlie,35"""

csv2_data = """id,name,age
1,Alice,31
3,Charlie,35
4,David,28"""

# Load the CSV data, specifying the key column
csv1 = load_csv(io.StringIO(csv1_data), key="id")
csv2 = load_csv(io.StringIO(csv2_data), key="id")

# Compare the two CSVs
diff = compare(csv1, csv2)

# Print the detected differences
print(f"Added rows: {diff.get('added')}")
print(f"Removed rows: {diff.get('removed')}")
print(f"Changed rows: {diff.get('changed')}")
print(f"Columns added: {diff.get('columns_added')}")
print(f"Columns removed: {diff.get('columns_removed')}")

view raw JSON →