Pyvespa: Python API for Vespa.ai
Pyvespa provides a Python API to Vespa, the open-sourced serving engine for storing, computing, and ranking big data at user serving time. It enables users to create, modify, deploy, and interact with running Vespa instances, facilitating faster prototyping and familiarization with Vespa features. The current version is 1.1.2. Releases are generally frequent, with minor versions released multiple times a week for the main Vespa engine and `pyvespa` releases following to maintain compatibility and add features.
Warnings
- breaking The configuration approach for `services.xml`, `query-profiles`, and `deployment.xml` was significantly revamped in `pyvespa >= 0.50.0`. The old methods may not support all configurations and have been replaced by a Vespa Tag (VT) system that mirrors the XML structure using Python functions.
- gotcha When deploying with `VespaDocker` locally, ensure your Docker daemon is running and has sufficient memory allocated (minimum 6GB is often recommended). Port conflicts or stale Docker containers from previous runs can also prevent deployment.
- gotcha Deployment to Vespa Cloud might occasionally fail with 'Value of X-Content-Hash header does not match computed content hash'. This can be caused by internal issues or an outdated application package name.
- gotcha If feeding or querying a local Docker-based Vespa instance results in errors or no results, check the `vespa.log` for 'diskLimitReached' warnings. This indicates that the Docker container has run out of allocated disk space.
- gotcha Vespa queries have a default limit of 400 hits. Attempting to retrieve more without configuration will result in an error or truncated results.
Install
-
pip install pyvespa
Imports
- ApplicationPackage
from pyvespa.application import ApplicationPackage
- Schema
from pyvespa.schema import Schema
- Document
from pyvespa.schema import Document
- Field
from pyvespa.schema import Field
- VespaDocker
from pyvespa.clients.vespa_docker import VespaDocker
- Vespa
from pyvespa.clients.vespa import Vespa
- VespaCloud
from pyvespa.clients.vespa_cloud import VespaCloud
Quickstart
import time
from pyvespa.application import ApplicationPackage
from pyvespa.schema import Schema, Document, Field
from pyvespa.clients.vespa_docker import VespaDocker
from pyvespa.clients.vespa import Vespa
# 1. Define your application schema
app_package = ApplicationPackage(
name='my_app',
schema=Schema(
name='my_document',
document=Document(
fields=[
Field(name='id', type='string', indexing=['attribute', 'summary']),
Field(name='title', type='string', indexing=['index', 'summary'], index='enable-bm25'),
Field(name='body', type='string', indexing=['index', 'summary'], index='enable-bm25')
]
)
)
)
# 2. Deploy to local Docker instance
# Ensure Docker daemon is running and has at least 6GB memory allocated
vespa_docker = VespaDocker(port=8080, container_memory='6G')
try:
app = vespa_docker.deploy(application_package=app_package)
print("Vespa application deployed successfully to Docker.")
# 3. Feed documents
docs_to_feed = [
{"id": "doc:1", "title": "The Quick Brown Fox", "body": "Jumps over the lazy dog."},
{"id": "doc:2", "title": "Lazy Dog Sits", "body": "The quick brown fox watches."}
]
app.feed_iterable(docs_to_feed)
print("Documents fed.")
# Wait a bit for indexing to complete
time.sleep(5)
# 4. Query data
query_result = app.query(yql='select * from sources * where userQuery();', query='fox')
print("Query Results:")
for hit in query_result.hits:
print(f" ID: {hit['id']}, Title: {hit['fields']['title']}, Body: {hit['fields']['body']}")
finally:
# 5. Shut down Vespa Docker instance
vespa_docker.stop()
print("Vespa Docker instance stopped.")