PyCSVSchema

raw JSON →
0.0.6 verified Mon Apr 27 auth: no python

PyCSVSchema is a Python implementation of the CSV Schema specification (version 0.3). It allows you to validate CSV files against a schema defined in YAML or JSON. Currently at version 0.0.6, it is in early development with weekly commits.

pip install pycsvschema
error TypeError: string indices must be integers
cause Passed a schema string directly to CSVSchema instead of a dict.
fix
Parse the schema with yaml.safe_load(schema_string) or json.loads(schema_json) first.
error AttributeError: module 'pycsvschema' has no attribute 'CSVValidator'
cause CSVValidator was renamed to CSVSchema in v0.0.6.
fix
Use CSVSchema instead of CSVValidator.
error FileNotFoundError: [Errno 2] No such file or directory: 'data.csv'
cause You passed a file path string to validate(), which is deprecated and often fails due to cwd issues.
fix
Open the file with open('data.csv') and pass the file object to validate().
gotcha PyCSVSchema expects the schema in Python dict form (parsed from YAML/JSON), not a raw string. Passing a schema string directly will raise a TypeError.
fix Parse the schema using yaml.safe_load() or json.loads() before passing to CSVSchema.
breaking In v0.0.6, the library switched from CSVValidator class to CSVSchema. Importing CSVValidator from pycsvschema no longer works.
fix Use CSVSchema instead of CSVValidator. Change imports and constructor calls.
deprecated Using 'validate()' with a file path string is deprecated; always pass a file object (opened via open()).
fix Use with open('file.csv') as f: validator.validate(f) instead of validator.validate('file.csv').

Load a schema from YAML, create a CSVSchema instance, and validate a CSV file.

import yaml
from pycsvschema import CSVSchema

schema_yaml = """
fields:
  - name: id
    type: integer
    constraints:
      required: true
  - name: name
    type: string
"""
schema = yaml.safe_load(schema_yaml)
validator = CSVSchema(schema)

with open('data.csv', 'r') as f:
    errors = validator.validate(f)

for error in errors:
    print(error)