{"id":9415,"library":"xinference-client","title":"Xinference Client","description":"Xinference-client is the official Python client library for interacting with a Xinference server. It allows users to manage and deploy various AI models (LLMs, embedding models, speech models, etc.) and perform inference. The library is actively maintained with frequent minor releases, often bi-weekly or monthly, reflecting updates in the broader Xinference ecosystem.","status":"active","version":"2.5.0","language":"en","source_language":"en","source_url":"https://github.com/xorbitsai/inference-client","tags":["inference","llm","embedding","client","ai","xorbits"],"install":[{"cmd":"pip install xinference-client","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Numerical operations, often used in model outputs (e.g., embeddings).","package":"numpy"},{"reason":"HTTP client for communication with the Xinference server.","package":"requests"},{"reason":"Used for version comparisons and managing package metadata.","package":"packaging"},{"reason":"Efficient binary serialization format used for data transfer.","package":"msgpack"}],"imports":[{"note":"The `xinference-client` PyPI package exposes its main Client class under the `xinference.client` module, not directly under `xinference_client`.","wrong":"from xinference_client import Client","symbol":"Client","correct":"from xinference.client import Client"}],"quickstart":{"code":"import os\nfrom xinference.client import Client\n\n# Ensure a Xinference server is running, e.g., using 'xinference-local' command.\n# Ensure an LLM (e.g., 'llama-2-chat') is launched on it.\nXINFERENCE_SERVER_URL = os.environ.get(\"XINFERENCE_SERVER_URL\", \"http://127.0.0.1:9997\")\n# Use a model name known to be available on your Xinference server\nLLM_MODEL_NAME = os.environ.get(\"XINFERENCE_LLM_MODEL\", \"llama-2-chat\")\n\ntry:\n    client = Client(base_url=XINFERENCE_SERVER_URL)\n\n    # Verify connection and model availability\n    available_models = client.list_models()\n    if LLM_MODEL_NAME not in available_models:\n        raise ValueError(\n            f\"Model '{LLM_MODEL_NAME}' not found on the Xinference server. \"\n            \"Please launch it first using `client.launch_model()` or select an available model from: \"\n            f\"{list(available_models.keys())}\"\n        )\n\n    print(f\"Successfully connected to Xinference server at {XINFERENCE_SERVER_URL}\")\n    print(f\"Using model: {LLM_MODEL_NAME}\")\n\n    # Perform a chat completion using the v2 OpenAI-compatible API\n    messages = [{\"role\": \"user\", \"content\": \"Hello, what is Xinference and what can it do?\"}]\n    response = client.chat.completions.create(\n        model=LLM_MODEL_NAME,\n        messages=messages,\n        max_tokens=100\n    )\n    print(\"\\nChat Completion Response:\")\n    print(response.choices[0].message.content)\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Please ensure your Xinference server is running, accessible, and the specified LLM is launched.\")\n","lang":"python","description":"This quickstart demonstrates how to connect to a running Xinference server, verify model availability, and perform a chat completion using the `Client` object. It assumes a Xinference server is running and a Large Language Model (LLM) is already launched on it."},"warnings":[{"fix":"If you encounter `AttributeError: 'Client' object has no attribute 'chat'`, try `client.v1.chat.completions.create`. If using `client.v1` fails, ensure your Xinference server is updated to a compatible version (>=1.0.0 for v2 API) and your client library is also up-to-date.","message":"The chat completions API within `xinference-client` underwent a significant change around v2.0.0 (or Xinference server v1.0.0). Newer versions align with OpenAI's API (`client.chat.completions.create`), while older versions or backward compatibility might require `client.v1.chat.completions.create`.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Use `client.launch_model(model_name=..., model_type=...)` to load a model onto the server. Verify launched models with `client.list_models()` before attempting inference.","message":"Xinference models must be explicitly launched on the Xinference server before they can be used by the client. Simply having the server running is not enough; models need to be loaded into memory.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure your Xinference server is running. By default, it runs on `http://127.0.0.1:9997`. Double-check the `base_url` parameter, especially if running on a different host or port.","message":"The Xinference client connects to a running Xinference server via HTTP. If the server is not running or the `base_url` provided to the `Client` constructor is incorrect, all client operations will fail with connection errors.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Change your import from `from xinference_client import Client` to `from xinference.client import Client`.","cause":"Incorrect import statement. The `xinference-client` PyPI package provides its modules under the `xinference` namespace.","error":"ModuleNotFoundError: No module named 'xinference_client'"},{"fix":"Start your Xinference server (e.g., by running `xinference-local` in your terminal). Verify the `base_url` provided to `xinference.client.Client` matches the server's address and port.","cause":"The Xinference server is not running or is not accessible at the specified URL and port.","error":"requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9997): Max retries exceeded with url: /v1/models (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at ...>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))"},{"fix":"Use `client.list_models()` to see available models. If the desired model is not listed, launch it using `client.launch_model(model_name='your-model', model_type='LLM')` (or 'embedding', 'rerank', etc.) before attempting to use it for inference.","cause":"The specified model has not been launched or is not available on the Xinference server.","error":"ValueError: Model 'llama-2-chat' not found on the Xinference server."},{"fix":"If your Xinference server is older (pre v1.0.0), you might need to use the `client.v1` interface if available, or update both your Xinference server and `xinference-client` library to the latest versions for the most up-to-date API.","cause":"This error typically occurs when using an older version of `xinference-client` that predates the OpenAI-compatible v2 API, or when interacting with an older Xinference server. The `client.chat` attribute was introduced for the v2 API.","error":"AttributeError: 'Client' object has no attribute 'chat'"}]}