{"id":6578,"library":"csvkit","title":"csvkit","description":"csvkit is a suite of powerful command-line tools for working with CSV files. It enables users to convert, clean, analyze, and process tabular data efficiently from the command line. The current version is 2.2.0. Its release cadence is moderate, with major versions often representing significant architectural changes.","status":"active","version":"2.2.0","language":"en","source_language":"en","source_url":"https://github.com/wireservice/csvkit","tags":["cli","csv","data processing","tabular data","command-line-tools"],"install":[{"cmd":"pip install csvkit","lang":"bash","label":"Install csvkit"}],"dependencies":[],"imports":[{"note":"csvkit is primarily a command-line utility suite. For programmatic access to tabular data processing in Python, the underlying `agate` library is the commonly imported and used component, which `csvkit` builds upon for its CLI tools. Direct programmatic use of csvkit's internal modules is less common and not part of its public API.","symbol":"agate","correct":"import agate"}],"quickstart":{"code":"import subprocess\nimport os\n\n# Create a dummy CSV file\ncsv_content = \"name,age\\nAlice,30\\nBob,24\\nCharlie,35\"\nfile_path = \"example.csv\"\nwith open(file_path, \"w\") as f:\n    f.write(csv_content)\n\nprint(\"Original CSV content:\")\nprint(csv_content)\n\n# Use csvlook to pretty-print (run as a subprocess)\nprint(\"\\n--- Output from csvlook ---\")\ntry:\n    result = subprocess.run(['csvlook', file_path], capture_output=True, text=True, check=True)\n    print(result.stdout)\nexcept subprocess.CalledProcessError as e:\n    print(f\"Error running csvlook: {e.stderr}\")\n\n# Use csvsql to query data (run as a subprocess)\nprint(\"\\n--- Output from csvsql (names of people over 25) ---\")\ntry:\n    result = subprocess.run(\n        ['csvsql', '--query', 'SELECT name FROM example WHERE age > 25', file_path],\n        capture_output=True, text=True, check=True\n    )\n    print(result.stdout)\nexcept subprocess.CalledProcessError as e:\n    print(f\"Error running csvsql: {e.stderr}\")\n\n# Clean up the dummy file\nos.remove(file_path)\nprint(f\"\\nCleaned up {file_path}\")","lang":"python","description":"This quickstart demonstrates how to use csvkit's command-line tools (`csvlook` and `csvsql`) programmatically from Python via the `subprocess` module. It creates a sample CSV, pretty-prints it, and then queries it."},"warnings":[{"fix":"Ensure your environment uses Python 3.6+ (csvkit 2.2.0 requires Python 3.6+).","message":"Version 2.0.0 of csvkit dropped support for Python 2, making it a Python 3-only library. Running on Python 2 will result in `ImportError` or other runtime errors.","severity":"breaking","affected_versions":"2.0.0 and later"},{"fix":"Specify the correct encoding using the `-e` or `--encoding` flag, e.g., `csvlook -e latin1 input.csv`.","message":"CSV encoding is a frequent source of issues. `csvkit` defaults to UTF-8, but many CSVs use other encodings (e.g., Latin-1, cp1252).","severity":"gotcha","affected_versions":"All versions"},{"fix":"For extremely large files (gigabytes), consider streaming tools or breaking files into smaller chunks. While `csvkit` is powerful, for very large datasets, dedicated big-data tools might be more suitable.","message":"`csvkit` tools can consume significant memory when processing very large CSV files, as they often load the entire dataset into memory.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use the `--no-header-row` flag if your CSV lacks a header, or inspect the output carefully. For complex cases, pre-process the CSV to ensure a clean header.","message":"Automatic header detection can sometimes fail on malformed or unusual CSVs, leading to data rows being interpreted as headers or vice versa. This is especially true for files without explicit headers.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}