Data Contract CLI
The datacontract CLI is an open-source command-line tool (current version 0.11.8) for working with Data Contracts. It natively supports the Open Data Contract Standard (ODCS) to lint data contracts, connect to data sources, execute schema and quality tests, detect breaking changes, and export to different formats. Written in Python, it can be used as a standalone CLI tool, in CI/CD pipelines, or directly as a Python library. The project is actively maintained with frequent releases.
Warnings
- breaking The project migrated from Go to Python, introducing breaking changes for users relying on the Go CLI. The Go version has been forked and is no longer actively developed by the main project. Users previously relying on the Go version for programmatic use need to switch to the Python library.
- breaking The internal data model transitioned from 'Data Contract Specification' to 'Open Data Contract Standard (ODCS) v3.1.0' as the default. This is a major change, and not all features of the old specification are supported in ODCS. The Data Contract Specification is now deprecated.
- gotcha Credentials for connecting to data sources (e.g., S3, BigQuery, PostgreSQL) are typically provided via environment variables and should not be hardcoded in your `datacontract.yaml` or version control. Each server type has specific environment variable naming conventions (e.g., `DATACONTRACT_S3_ACCESS_KEY_ID`).
- gotcha The library requires Python versions >=3.10 and <3.13. Using unsupported Python versions may lead to installation failures or runtime issues.
- gotcha Specific internal dependencies, such as `DuckDB`, may have version restrictions. For example, version 0.11.3 fixed a dependency issue by restricting `DuckDB` to `<1.4.0`. Attempting to use a newer, incompatible version of such internal dependencies can cause failures.
Install
-
pip install datacontract-cli -
pip install 'datacontract-cli[all]'
Imports
- DataContract
from datacontract.data_contract import DataContract
Quickstart
import os
from datacontract.data_contract import DataContract
# Simulate a datacontract.yaml file content
datacontract_yaml_content = '''
dataContractSpecification: 1.2.0
id: urn:datacontract:example:test-contract
info:
title: Example Test Contract
version: 1.0.0
owner: Data Team
servers:
local_file:
type: local
path: ./data/{model}.csv
format: csv
delimiter: ','
models:
my_data:
description: A simple dataset for testing.
fields:
id:
type: string
primaryKey: true
name:
type: string
value:
type: integer
quality:
- type: sql
description: 'All values should be positive.'
query: |
SELECT *
FROM my_data
WHERE value <= 0
'''
# Create a dummy data file for the test
with open('data_my_data.csv', 'w') as f:
f.write('id,name,value\n')
f.write('1,Alice,10\n')
f.write('2,Bob,20\n')
# Write the data contract to a temporary file
with open('datacontract.yaml', 'w') as f:
f.write(datacontract_yaml_content)
# Environment variables for credentials are often required for real data sources.
# For local testing, they might not be strictly needed depending on the 'server' configuration.
# os.environ['DATACONTRACT_S3_ACCESS_KEY_ID'] = os.environ.get('DATACONTRACT_S3_ACCESS_KEY_ID', '')
# os.environ['DATACONTRACT_S3_SECRET_ACCESS_KEY'] = os.environ.get('DATACONTRACT_S3_SECRET_ACCESS_KEY', '')
try:
data_contract = DataContract(data_contract_file="datacontract.yaml")
run_results = data_contract.test()
if run_results.has_passed():
print("Data contract tests passed successfully.")
else:
print("Data contract tests failed.")
print(run_results.to_json())
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Clean up dummy files
os.remove('datacontract.yaml')
os.remove('data_my_data.csv')