CSVW Python Library
The `csvw` Python library (version 3.7.0) provides an API to read and write relational, tabular data in adherence to the W3C CSV on the Web specification. It offers functionalities for parsing CSVW described data, converting it to JSON, and validating metadata. The project maintains an active development status with regular releases.
Warnings
- breaking There are multiple Python libraries with 'csvw' in their name, notably `csvw` (this library) and `csvwlib`. They have distinct APIs and functionalities. Installing and importing `csvwlib` instead of `csvw` will lead to incompatible API calls and unexpected behavior.
- gotcha The `csvw` library does not implement the *full* CSVW specification. Specifically, when reading CSV files with headers, columns are matched based on their header text and column descriptions' 'name' or 'titles' attributes, not strictly by order as might be expected by the spec. This allows more flexibility but deviates from a strict interpretation.
- gotcha Due to reliance on Python's standard `csv` module, certain behaviors related to `escapechar` and `commentPrefix` can be inconsistent or unexpected. For instance, if `commentPrefix` is specified in a `Dialect` instance, rows starting with it will be skipped even if the value was quoted. Also, cell content with `escapechar` may not round-trip as expected when `doubleQuote==False` and minimal quoting is used.
- gotcha The `anyURI` datatype in `csvw.datatypes` normalizes URLs according to RFC 3986 during serialization to a string. This normalization means that round-tripping (serializing and then deserializing) a URI is not guaranteed to yield an identical string if the original URI contained non-normalized forms.
Install
-
pip install csvw
Imports
- CSVW
from csvw import CSVW
Quickstart
import json
from csvw import CSVW
import os
# Example using a remote CSVW metadata file
# Note: In a real application, you might use a local file path.
# Ensure 'https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/test.tsv' is accessible.
try:
data = CSVW('https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/test.tsv')
# Convert the CSVW data to JSON
json_output = data.to_json()
print(json.dumps(json_output, indent=2))
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure the URL is correct and accessible.")