{"id":23988,"library":"lmdeploy","title":"LMDeploy","description":"LMDeploy is a toolkit for compressing, deploying, and serving large language models (LLMs). It supports efficient inference with quantization, continuous batching, and various backends (e.g., PyTorch, TensorRT-LLM). The current version is 0.12.3, with frequent releases following the development of dependencies and model support.","status":"active","version":"0.12.3","language":"python","source_language":"en","source_url":"https://github.com/InternLM/lmdeploy","tags":["llm","inference","deployment","quantization","serving"],"install":[{"cmd":"pip install lmdeploy","lang":"bash","label":"Install from PyPI"}],"dependencies":[],"imports":[{"note":"The top-level pipeline function is the correct entry point; the 'serve' submodule is for server components.","wrong":"from lmdeploy.serve import pipeline","symbol":"pipeline","correct":"from lmdeploy import pipeline"},{"note":"TurbomindEngineConfig is exported from the main lmdeploy namespace in recent versions.","wrong":"from lmdeploy.turbomind import TurbomindEngineConfig","symbol":"TurbomildEngineConfig","correct":"from lmdeploy import TurbomindEngineConfig"}],"quickstart":{"code":"from lmdeploy import pipeline\nfrom lmdeploy import TurbomindEngineConfig\n\nengine_config = TurbomindEngineConfig(model_format='hf', tp=1)\npipe = pipeline('internlm/internlm2_5-1_8b', engine_config=engine_config)\nresponse = pipe('Hello, how are you?')\nprint(response.text)","lang":"python","description":"Initialize a pipeline with a Hugging Face model and engine config, then generate a response."},"warnings":[{"fix":"Update imports to `from lmdeploy import TurbomindEngineConfig`.","message":"The `TurbomindEngineConfig` import path changed. In versions before 0.12.0, it was `from lmdeploy.turbomind import TurbomindEngineConfig`. Now it is `from lmdeploy import TurbomindEngineConfig`.","severity":"breaking","affected_versions":">=0.12.0 (change), <0.12.0 (old import)"},{"fix":"Switch to using the pipeline with `TurbomindEngineConfig`.","message":"The `turbomind` backend is deprecated; use `TurbomindEngineConfig` with model_format='hf' or 'awq' instead of direct Turbomind engine creation.","severity":"deprecated","affected_versions":">=0.10.0"},{"fix":"Explicitly set `model_format` in `TurbomindEngineConfig` (e.g., `model_format='hf'`) or use the `--model-format` argument when using CLI.","message":"When using `pipeline`, the model must be in Hugging Face format (HF) or quantized with LMDeploy's format. Passing a model name without the correct format may cause silent fallback or errors.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-05-01T00:00:00.000Z","next_check":"2026-07-30T00:00:00.000Z","problems":[{"fix":"Use `from lmdeploy import TurbomindEngineConfig` instead.","cause":"In recent versions, `turbomind` is not a separate importable module; its classes are moved to `lmdeploy` namespace.","error":"ModuleNotFoundError: No module named 'lmdeploy.turbomind'"},{"fix":"Use `from lmdeploy import pipeline`.","cause":"The `pipeline` function is not in `lmdeploy.serve`; it is in the top-level `lmdeploy` module.","error":"ImportError: cannot import name 'pipeline' from 'lmdeploy.serve'"},{"fix":"Check the model format and use a valid one. For Hugging Face models, use `model_format='hf'`.","cause":"The model_format argument in TurbomindEngineConfig expects one of the supported formats (e.g., 'hf', 'awq', 'w4a16', 'w8a8'). An incorrect string causes this error.","error":"ValueError: Unsupported model format 'xxxx'"}],"ecosystem":"pypi","meta_description":null,"install_score":null,"install_tag":null,"quickstart_score":null,"quickstart_tag":null}