{"id":10289,"library":"text-generation","title":"Hugging Face Text Generation Python Client","description":"The `text-generation` library is the official Python client for interacting with the Hugging Face Text Generation Inference (TGI) backend, a highly optimized solution for deploying large language models. It provides synchronous and asynchronous APIs for text generation, including streaming capabilities. The current version is 0.7.0, and while the underlying TGI server has a rapid release cadence with frequent updates, the client library itself is updated less often, focusing on stability and compatibility with common TGI server versions.","status":"active","version":"0.7.0","language":"en","source_language":"en","source_url":"https://github.com/huggingface/text-generation-inference","tags":["huggingface","llm","text-generation","inference","ai","nlp","client"],"install":[{"cmd":"pip install text-generation","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Used for making synchronous and asynchronous HTTP requests to the TGI server.","package":"httpx"},{"reason":"Used for data validation and parsing of request and response models.","package":"pydantic"}],"imports":[{"symbol":"Client","correct":"from text_generation import Client"},{"symbol":"AsyncClient","correct":"from text_generation import AsyncClient"},{"note":"The error class was directly exposed under the top-level package in recent versions for simpler import.","wrong":"from text_generation.errors import InferenceAPIError","symbol":"InferenceAPIError","correct":"from text_generation import InferenceAPIError"}],"quickstart":{"code":"import os\nfrom text_generation import Client, InferenceAPIError\n\n# Ensure the TGI server is running and accessible at this URL\n# For example, a local server might be 'http://127.0.0.1:8080'\n# or a deployed endpoint 'https://your-tgi-endpoint.huggingface.cloud'\nTGI_ENDPOINT = os.environ.get('TGI_ENDPOINT', 'http://127.0.0.1:8080')\n\ntry:\n    client = Client(TGI_ENDPOINT)\n    \n    # Generate a single response\n    response = client.generate(\n        \"What is the capital of France?\", \n        max_new_tokens=20,\n        repetition_penalty=1.05\n    )\n    print(f\"Generated Text: {response.generated_text}\")\n\n    print(\"\\n--- Streaming Example ---\")\n    # Stream tokens\n    for response in client.generate_stream(\n        \"Write a short poem about a cat.\", \n        max_new_tokens=50\n    ):\n        if not response.token.special:\n            print(response.token.text, end=\"\", flush=True)\n    print(\"\\n\")\n\nexcept InferenceAPIError as e:\n    print(f\"Error from TGI server: {e}\")\nexcept Exception as e:\n    print(f\"An unexpected error occurred: {e}\")","lang":"python","description":"This quickstart demonstrates how to instantiate a `Client` for synchronous text generation. It includes examples for both single-response generation and streaming token-by-token. Remember to set the `TGI_ENDPOINT` environment variable or directly provide the correct URL for your running Text Generation Inference server."},"warnings":[{"fix":"Always ensure your `text-generation` client version is compatible with your deployed `text-generation-inference` server version. Refer to the TGI server's documentation for API stability notes and recommended client versions.","message":"The `text-generation` client is tightly coupled with the `text-generation-inference` (TGI) server. Breaking changes in the TGI server's API (e.g., changes to request/response schemas or new required parameters) may necessitate an update to the client library, even if the client's own version hasn't changed drastically.","severity":"breaking","affected_versions":"All versions, depends on TGI server version"},{"fix":"Verify that the `base_url` provided to `Client()` or `AsyncClient()` points to a running and network-accessible TGI endpoint. Double-check port numbers, hostnames, and any necessary authentication if applicable.","message":"Incorrect or inaccessible `base_url` for the `Client` will lead to connection errors. The default `http://127.0.0.1:8080` is only for a local server running with default settings.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `generate()` for single, complete outputs where latency isn't a critical concern for the first token, or for batching. Use `generate_stream()` for real-time applications where displaying tokens as they are produced enhances user experience.","message":"Differentiating between `client.generate()` and `client.generate_stream()` is crucial. `generate()` returns a single `Response` object with the full generated text after completion. `generate_stream()` returns an iterable that yields `StreamResponse` objects as tokens are generated.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always wrap your client calls in a `try...except InferenceAPIError` block. Log the error message to understand the root cause of the server-side failure. Check your request parameters against the model's capabilities and server configuration.","message":"Server-side errors are encapsulated in `InferenceAPIError`. These errors often contain specific details from the TGI server, such as issues with model loading, invalid parameters, or GPU memory exhaustion.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Ensure your TGI server is running and accessible from where you are running the client. Verify the `base_url` (e.g., `http://127.0.0.1:8080`) matches the server's actual address and port.","cause":"The Python client could not establish a connection to the Text Generation Inference server at the specified `base_url`. This typically means the server is not running, is running on a different port/host, or a firewall is blocking the connection.","error":"httpx.ConnectError: [Errno 111] Connection refused"},{"fix":"Examine the error message within the `InferenceAPIError`. Common causes include: invalid model ID, insufficient GPU memory, unsupported generation parameters, or issues during model loading. Check server logs for more detailed diagnostics.","cause":"This error indicates that the TGI server received your request but encountered an issue processing it, returning an HTTP error status (e.g., 400, 404, 500). The `error` and `error_type` fields in the message usually provide specifics.","error":"text_generation.InferenceAPIError: {'error': '...', 'error_type': '...'}"},{"fix":"Always provide the URL of your Text Generation Inference server when creating a `Client` instance, e.g., `client = Client('http://your-tgi-server:8080')`.","cause":"You are attempting to instantiate the `Client` class without providing the `base_url` argument, which is mandatory.","error":"TypeError: Client.__init__() missing 1 required positional argument: 'base_url'"},{"fix":"Ensure you have the latest `text-generation` library installed (`pip install --upgrade text-generation`). Verify your import statement is `from text_generation import Client` and check for any local files named `text_generation.py` that might be shadowing the installed library.","cause":"This usually indicates an outdated `text-generation` library installation, a conflicting package name in your Python environment, or an incorrect import path.","error":"AttributeError: module 'text_generation' has no attribute 'Client'"}]}