Baseten Performance Client
The `baseten-performance-client` is a Python library designed for ultra-high performance interactions with Baseten's inference endpoints, particularly for embedding models. It provides a simple client interface for sending prediction requests. As of the current version `0.1.5`, it primarily focuses on optimizing HTTP requests to Baseten services. Its release cadence is tied to Baseten's internal development cycles, with updates typically driven by specific performance or feature needs.
Common errors
-
baseten_performance_client.errors.BasetenAuthenticationError: Invalid API key
cause The Baseten API key provided is either missing, incorrect, or expired.fixVerify your `BASETEN_API_KEY` environment variable or the `api_key` argument. Generate a new API key from your Baseten account settings if needed. -
baseten_performance_client.errors.BasetenModelNotFoundError: Model with ID '...' not found
cause The `model_id` specified in the `predict` call does not correspond to an existing, deployed, and accessible model on your Baseten account.fixDouble-check the `model_id` for typos. Ensure the model is deployed and you have the correct permissions to access it on the Baseten platform. -
baseten_performance_client.errors.BasetenAPIError: 400 Bad Request: Input validation failed
cause The `input` dictionary provided to the `predict` method does not match the schema expected by the Baseten model.fixReview the specific error message for details on the validation failure. Compare your `input` structure (keys, data types, nested objects/arrays) against the model's documented API schema.
Warnings
- gotcha The client requires a valid Baseten API key for most models. Without it, requests will fail with authentication errors.
- gotcha The `input` dictionary passed to the `predict` method must strictly adhere to the target Baseten model's expected input schema. Mismatched or malformed inputs will result in HTTP 400 Bad Request errors.
- gotcha While named 'performance client', the current implementation primarily uses standard HTTP/1.1 and `asyncio.run()` internally for synchronous calls. Users expecting ultra-low latency typical of gRPC or raw socket connections for very small requests might encounter higher overheads due to HTTP and Python's async-to-sync bridging.
Install
-
pip install baseten-performance-client
Imports
- PerformanceClient
from baseten_performance_client import PerformanceClient
Quickstart
import os
from baseten_performance_client import PerformanceClient
# Ensure your Baseten API key is set as an environment variable
# os.environ['BASETEN_API_KEY'] = 'YOUR_BASETEN_API_KEY'
api_key = os.environ.get('BASETEN_API_KEY', 'YOUR_BASETEN_API_KEY_HERE')
model_id = 'YOUR_MODEL_ID'
if api_key == 'YOUR_BASETEN_API_KEY_HERE' or not api_key:
print("Warning: Please set the BASETEN_API_KEY environment variable or replace 'YOUR_BASETEN_API_KEY_HERE'.")
print("Skipping prediction due to missing API key.")
else:
try:
client = PerformanceClient(api_key=api_key)
response = client.predict(
model_id=model_id,
input={'text': 'The quick brown fox jumps over the lazy dog.'}
)
print("Prediction successful:")
print(response)
except Exception as e:
print(f"An error occurred during prediction: {e}")