LMDeploy
raw JSON → 0.12.3 verified Fri May 01 auth: no python
LMDeploy is a toolkit for compressing, deploying, and serving large language models (LLMs). It supports efficient inference with quantization, continuous batching, and various backends (e.g., PyTorch, TensorRT-LLM). The current version is 0.12.3, with frequent releases following the development of dependencies and model support.
pip install lmdeploy Common errors
error ModuleNotFoundError: No module named 'lmdeploy.turbomind' ↓
cause In recent versions, `turbomind` is not a separate importable module; its classes are moved to `lmdeploy` namespace.
fix
Use
from lmdeploy import TurbomindEngineConfig instead. error ImportError: cannot import name 'pipeline' from 'lmdeploy.serve' ↓
cause The `pipeline` function is not in `lmdeploy.serve`; it is in the top-level `lmdeploy` module.
fix
Use
from lmdeploy import pipeline. error ValueError: Unsupported model format 'xxxx' ↓
cause The model_format argument in TurbomindEngineConfig expects one of the supported formats (e.g., 'hf', 'awq', 'w4a16', 'w8a8'). An incorrect string causes this error.
fix
Check the model format and use a valid one. For Hugging Face models, use
model_format='hf'. Warnings
breaking The `TurbomindEngineConfig` import path changed. In versions before 0.12.0, it was `from lmdeploy.turbomind import TurbomindEngineConfig`. Now it is `from lmdeploy import TurbomindEngineConfig`. ↓
fix Update imports to `from lmdeploy import TurbomindEngineConfig`.
deprecated The `turbomind` backend is deprecated; use `TurbomindEngineConfig` with model_format='hf' or 'awq' instead of direct Turbomind engine creation. ↓
fix Switch to using the pipeline with `TurbomindEngineConfig`.
gotcha When using `pipeline`, the model must be in Hugging Face format (HF) or quantized with LMDeploy's format. Passing a model name without the correct format may cause silent fallback or errors. ↓
fix Explicitly set `model_format` in `TurbomindEngineConfig` (e.g., `model_format='hf'`) or use the `--model-format` argument when using CLI.
Imports
- pipeline wrong
from lmdeploy.serve import pipelinecorrectfrom lmdeploy import pipeline - TurbomildEngineConfig wrong
from lmdeploy.turbomind import TurbomindEngineConfigcorrectfrom lmdeploy import TurbomindEngineConfig
Quickstart
from lmdeploy import pipeline
from lmdeploy import TurbomindEngineConfig
engine_config = TurbomindEngineConfig(model_format='hf', tp=1)
pipe = pipeline('internlm/internlm2_5-1_8b', engine_config=engine_config)
response = pipe('Hello, how are you?')
print(response.text)