Hugging Face Text Generation Python Client
The `text-generation` library is the official Python client for interacting with the Hugging Face Text Generation Inference (TGI) backend, a highly optimized solution for deploying large language models. It provides synchronous and asynchronous APIs for text generation, including streaming capabilities. The current version is 0.7.0, and while the underlying TGI server has a rapid release cadence with frequent updates, the client library itself is updated less often, focusing on stability and compatibility with common TGI server versions.
Common errors
-
httpx.ConnectError: [Errno 111] Connection refused
cause The Python client could not establish a connection to the Text Generation Inference server at the specified `base_url`. This typically means the server is not running, is running on a different port/host, or a firewall is blocking the connection.fixEnsure your TGI server is running and accessible from where you are running the client. Verify the `base_url` (e.g., `http://127.0.0.1:8080`) matches the server's actual address and port. -
text_generation.InferenceAPIError: {'error': '...', 'error_type': '...'}cause This error indicates that the TGI server received your request but encountered an issue processing it, returning an HTTP error status (e.g., 400, 404, 500). The `error` and `error_type` fields in the message usually provide specifics.fixExamine the error message within the `InferenceAPIError`. Common causes include: invalid model ID, insufficient GPU memory, unsupported generation parameters, or issues during model loading. Check server logs for more detailed diagnostics. -
TypeError: Client.__init__() missing 1 required positional argument: 'base_url'
cause You are attempting to instantiate the `Client` class without providing the `base_url` argument, which is mandatory.fixAlways provide the URL of your Text Generation Inference server when creating a `Client` instance, e.g., `client = Client('http://your-tgi-server:8080')`. -
AttributeError: module 'text_generation' has no attribute 'Client'
cause This usually indicates an outdated `text-generation` library installation, a conflicting package name in your Python environment, or an incorrect import path.fixEnsure you have the latest `text-generation` library installed (`pip install --upgrade text-generation`). Verify your import statement is `from text_generation import Client` and check for any local files named `text_generation.py` that might be shadowing the installed library.
Warnings
- breaking The `text-generation` client is tightly coupled with the `text-generation-inference` (TGI) server. Breaking changes in the TGI server's API (e.g., changes to request/response schemas or new required parameters) may necessitate an update to the client library, even if the client's own version hasn't changed drastically.
- gotcha Incorrect or inaccessible `base_url` for the `Client` will lead to connection errors. The default `http://127.0.0.1:8080` is only for a local server running with default settings.
- gotcha Differentiating between `client.generate()` and `client.generate_stream()` is crucial. `generate()` returns a single `Response` object with the full generated text after completion. `generate_stream()` returns an iterable that yields `StreamResponse` objects as tokens are generated.
- gotcha Server-side errors are encapsulated in `InferenceAPIError`. These errors often contain specific details from the TGI server, such as issues with model loading, invalid parameters, or GPU memory exhaustion.
Install
-
pip install text-generation
Imports
- Client
from text_generation import Client
- AsyncClient
from text_generation import AsyncClient
- InferenceAPIError
from text_generation.errors import InferenceAPIError
from text_generation import InferenceAPIError
Quickstart
import os
from text_generation import Client, InferenceAPIError
# Ensure the TGI server is running and accessible at this URL
# For example, a local server might be 'http://127.0.0.1:8080'
# or a deployed endpoint 'https://your-tgi-endpoint.huggingface.cloud'
TGI_ENDPOINT = os.environ.get('TGI_ENDPOINT', 'http://127.0.0.1:8080')
try:
client = Client(TGI_ENDPOINT)
# Generate a single response
response = client.generate(
"What is the capital of France?",
max_new_tokens=20,
repetition_penalty=1.05
)
print(f"Generated Text: {response.generated_text}")
print("\n--- Streaming Example ---")
# Stream tokens
for response in client.generate_stream(
"Write a short poem about a cat.",
max_new_tokens=50
):
if not response.token.special:
print(response.token.text, end="", flush=True)
print("\n")
except InferenceAPIError as e:
print(f"Error from TGI server: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")