FastChat

raw JSON →
0.2.36 verified Fri May 01 auth: no python

FastChat (fschat) is an open platform for training, serving, and evaluating large language model chatbots. Currently at version 0.2.36, it supports models like Vicuna, Llama, and many others, with features including a web UI, OpenAI-compatible API, and integration with vLLM, SGLang, and MLX. Released under the Apache 2.0 license.

pip install fschat
error ModuleNotFoundError: No module named 'fastchat'
cause Package not installed or installed under different name (fschat, not fastchat).
fix
Install with: pip install fschat. (Note: pip package name is fschat, but import uses 'fastchat').
error RuntimeError: The server cannot be started because no controller is running.
cause The controller process is not started or is not reachable.
fix
Start controller first: python -m fastchat.serve.controller
error OSError: [Errno 98] Address already in use
cause Ports for controller (21001), model worker (21002), or web server (7860) are already occupied.
fix
Kill existing processes or use different ports via --port arguments.
error ValueError: The conversation template 'vicuna' is not found.
cause The model template identifier is incorrect or model not registered.
fix
Use a valid template name (e.g., 'vicuna', 'llama-2', 'mistral') or add custom template via register_template.
error AssertionError: The model name is not in the list of models.
cause The model worker was started with a model name that doesn't match the request.
fix
Ensure --model-names argument in model worker matches the model name used by the client.
gotcha FastChat's controller uses HTTP requests; all components (controller, model worker, web server) must be started separately. Failure to start the controller first leads to connection errors.
fix Start controller: python -m fastchat.serve.controller, then model worker, then web server.
gotcha The model worker requires significant GPU memory; default settings may OOM on smaller GPUs. Adjust --num-gpus or --load-8bit accordingly.
fix Use --load-8bit for 8-bit quantization or --device cpu for CPU inference (slow).
deprecated The old fastchat.serve.gradio_web_server is deprecated in favor of fastchat.serve.gradio_web_server_multi (for multiple models) or the new web UI variants.
fix Use python -m fastchat.serve.gradio_web_server_multi for multiple models.
gotcha When using the OpenAI-compatible API, environment variable OPENAI_API_BASE must be set to the FastChat API server URL (e.g., http://localhost:8000/v1). Otherwise clients will try to reach the real OpenAI API.
fix Set OPENAI_API_BASE=http://localhost:8000/v1 in your environment.
breaking In version 0.2.30, the default model worker registration changed; old controller may not recognize new worker without --force-reload.
fix Use --force-reload flag on model worker if registration fails.
pip install fschat[model_worker,webui]

Generate a conversation prompt using the Vicuna template.

from fastchat.model import get_conversation_template
from fastchat.serve.inference import generate_stream

conversation = get_conversation_template("vicuna")
conversation.append_message(conversation.roles[0], "Hello!")
conversation.append_message(conversation.roles[1], None)  # assistant placeholder
prompt = conversation.get_prompt()

# Example using a local model (requires model download)
# from fastchat.serve.model_worker import ModelWorker
# worker = ModelWorker(controller_addr="http://localhost:21001", model_names=["vicuna-7b-v1.5"], worker_addr="http://localhost:21002")
print(prompt)