FastChat

0.2.36 verified Fri May 01 auth: no python

FastChat (fschat) is an open platform for training, serving, and evaluating large language model chatbots. Currently at version 0.2.36, it supports models like Vicuna, Llama, and many others, with features including a web UI, OpenAI-compatible API, and integration with vLLM, SGLang, and MLX. Released under the Apache 2.0 license.

pip install fschat

Common errors

error ModuleNotFoundError: No module named 'fastchat' ↓

cause Package not installed or installed under different name (fschat, not fastchat).

fix

Install with: pip install fschat. (Note: pip package name is fschat, but import uses 'fastchat').

error RuntimeError: The server cannot be started because no controller is running. ↓

cause The controller process is not started or is not reachable.

fix

Start controller first: python -m fastchat.serve.controller

error OSError: [Errno 98] Address already in use ↓

cause Ports for controller (21001), model worker (21002), or web server (7860) are already occupied.

fix

Kill existing processes or use different ports via --port arguments.

error ValueError: The conversation template 'vicuna' is not found. ↓

cause The model template identifier is incorrect or model not registered.

fix

Use a valid template name (e.g., 'vicuna', 'llama-2', 'mistral') or add custom template via register_template.

error AssertionError: The model name is not in the list of models. ↓

cause The model worker was started with a model name that doesn't match the request.

fix

Ensure --model-names argument in model worker matches the model name used by the client.

Warnings

gotcha FastChat's controller uses HTTP requests; all components (controller, model worker, web server) must be started separately. Failure to start the controller first leads to connection errors. ↓

fix Start controller: python -m fastchat.serve.controller, then model worker, then web server.

gotcha The model worker requires significant GPU memory; default settings may OOM on smaller GPUs. Adjust --num-gpus or --load-8bit accordingly. ↓

fix Use --load-8bit for 8-bit quantization or --device cpu for CPU inference (slow).

deprecated The old fastchat.serve.gradio_web_server is deprecated in favor of fastchat.serve.gradio_web_server_multi (for multiple models) or the new web UI variants. ↓

fix Use python -m fastchat.serve.gradio_web_server_multi for multiple models.

gotcha When using the OpenAI-compatible API, environment variable OPENAI_API_BASE must be set to the FastChat API server URL (e.g., http://localhost:8000/v1). Otherwise clients will try to reach the real OpenAI API. ↓

fix Set OPENAI_API_BASE=http://localhost:8000/v1 in your environment.

breaking In version 0.2.30, the default model worker registration changed; old controller may not recognize new worker without --force-reload. ↓

fix Use --force-reload flag on model worker if registration fails.

Install

pip install fschat[model_worker,webui]

Imports

FastChat

wrong

from fastchat import something

correct

from fastchat.model import get_conversation_template

FastChat uses submodules; direct top-level imports are rare.

ModelWorker

from fastchat.serve.model_worker import ModelWorker

Quickstart

Generate a conversation prompt using the Vicuna template.

from fastchat.model import get_conversation_template
from fastchat.serve.inference import generate_stream

conversation = get_conversation_template("vicuna")
conversation.append_message(conversation.roles[0], "Hello!")
conversation.append_message(conversation.roles[1], None)  # assistant placeholder
prompt = conversation.get_prompt()

# Example using a local model (requires model download)
# from fastchat.serve.model_worker import ModelWorker
# worker = ModelWorker(controller_addr="http://localhost:21001", model_names=["vicuna-7b-v1.5"], worker_addr="http://localhost:21002")
print(prompt)