LMDeploy

0.12.3 verified Fri May 01 auth: no python

LMDeploy is a toolkit for compressing, deploying, and serving large language models (LLMs). It supports efficient inference with quantization, continuous batching, and various backends (e.g., PyTorch, TensorRT-LLM). The current version is 0.12.3, with frequent releases following the development of dependencies and model support.

pip install lmdeploy

Common errors

error ModuleNotFoundError: No module named 'lmdeploy.turbomind' ↓

cause In recent versions, `turbomind` is not a separate importable module; its classes are moved to `lmdeploy` namespace.

fix

Use from lmdeploy import TurbomindEngineConfig instead.

error ImportError: cannot import name 'pipeline' from 'lmdeploy.serve' ↓

cause The `pipeline` function is not in `lmdeploy.serve`; it is in the top-level `lmdeploy` module.

fix

Use from lmdeploy import pipeline.

error ValueError: Unsupported model format 'xxxx' ↓

cause The model_format argument in TurbomindEngineConfig expects one of the supported formats (e.g., 'hf', 'awq', 'w4a16', 'w8a8'). An incorrect string causes this error.

fix

Check the model format and use a valid one. For Hugging Face models, use model_format='hf'.

Warnings

breaking The `TurbomindEngineConfig` import path changed. In versions before 0.12.0, it was `from lmdeploy.turbomind import TurbomindEngineConfig`. Now it is `from lmdeploy import TurbomindEngineConfig`. ↓

fix Update imports to `from lmdeploy import TurbomindEngineConfig`.

deprecated The `turbomind` backend is deprecated; use `TurbomindEngineConfig` with model_format='hf' or 'awq' instead of direct Turbomind engine creation. ↓

fix Switch to using the pipeline with `TurbomindEngineConfig`.

gotcha When using `pipeline`, the model must be in Hugging Face format (HF) or quantized with LMDeploy's format. Passing a model name without the correct format may cause silent fallback or errors. ↓

fix Explicitly set `model_format` in `TurbomindEngineConfig` (e.g., `model_format='hf'`) or use the `--model-format` argument when using CLI.

Imports

pipeline
wrong
```
from lmdeploy.serve import pipeline
```
correct
```
from lmdeploy import pipeline
```
The top-level pipeline function is the correct entry point; the 'serve' submodule is for server components.
TurbomildEngineConfig
wrong
```
from lmdeploy.turbomind import TurbomindEngineConfig
```
correct
```
from lmdeploy import TurbomindEngineConfig
```
TurbomindEngineConfig is exported from the main lmdeploy namespace in recent versions.

Quickstart

Initialize a pipeline with a Hugging Face model and engine config, then generate a response.

from lmdeploy import pipeline
from lmdeploy import TurbomindEngineConfig

engine_config = TurbomindEngineConfig(model_format='hf', tp=1)
pipe = pipeline('internlm/internlm2_5-1_8b', engine_config=engine_config)
response = pipe('Hello, how are you?')
print(response.text)