Schema Annotations for Linked Avro Data (SALAD)
Schema Salad (SALAD) is a schema language for describing JSON or YAML structured linked data documents. It provides rules for preprocessing, structural validation, and hyperlink checking, and supports rich data modeling features like inheritance, template specialization, object identifiers, and references. It also enables documentation and code generation, and transformation to RDF, bridging document-oriented data modeling with the Semantic Web. The current version is 8.9.20260327095315, and it appears to have a frequent release cadence, often multiple patch releases per month, indicating active development.
Warnings
- breaking Schema Salad requires Python 3.10 or newer. Installing or running on older Python versions will result in compatibility errors.
- gotcha The default recursive validation (e.g., via `schema-salad-tool` or direct API calls) preserves line numbers and provides human-readable errors but can be significantly slower for large and deeply nested documents. For performance-critical applications, consider using code generation features, although this may currently result in less descriptive error messages and potential loss of some metadata.
- gotcha Understanding the difference between `$import` and `$mixin` directives in SALAD schemas is crucial. `$import` loads an external document without inheriting the context of the importing document, using the imported document's URI as its base. `$mixin` loads a document that *does* inherit the context of the importing document. Misapplying these can lead to incorrect URI resolution, type lookup, and validation behavior.
- gotcha When evolving SALAD schemas, changes to existing fields (e.g., renaming, removing, changing data types, or modifying nullability) can introduce breaking changes for consumers of your documents. Treat your schemas as APIs. It's generally safer to add new fields (additive-only pattern) and deprecate old ones over time, rather than modifying or removing existing fields directly.
Install
-
pip install schema_salad -
pip install schema_salad[pycodegen]
Imports
- schema_salad
import schema_salad
- load_and_validate
from schema_salad.schema import load_and_validate
- Fetcher
from schema_salad.fetcher import Fetcher
Quickstart
import os
import json
from schema_salad.schema import load_and_validate
from schema_salad.ref_resolver import Loader
# Define a simple SALAD schema (YAML string)
schema_content = """
$schema: http://json-schema.org/draft-07/schema#
$id: https://example.com/myschema.yml
type: record
name: MyRecord
documentRoot: true
fields:
- name: id
type: string
jsonldPredicate: '@id'
- name: message
type: string
- name: count
type: int
"""
# Define a document to validate (YAML string)
document_content = """
id: 'my_first_doc'
message: "Hello, SALAD!"
count: 42
"""
# Save schema and document to temporary files
schema_file = 'temp_schema.yml'
document_file = 'temp_document.yml'
with open(schema_file, 'w') as f:
f.write(schema_content)
with open(document_file, 'w') as f:
f.write(document_content)
# Create a Loader instance
loader = Loader({})
# Load and validate the schema itself
print(f"Validating schema: {schema_file}")
schema_salad_obj, _, _ = load_and_validate(schema_file, loader)
print("Schema is valid.")
# Load and validate the document against the schema
print(f"Validating document: {document_file}")
try:
validated_doc, _ = load_and_validate(schema_file, document_file, loader)
print("Document is valid.")
print("Validated document (as Python object):")
print(json.dumps(validated_doc, indent=2))
except Exception as e:
print(f"Document validation failed: {e}")
# Clean up temporary files
os.remove(schema_file)
os.remove(document_file)