vLLM-Omni

raw JSON →
0.20.0 verified Sat May 09 auth: no python

vLLM-Omni is a framework for efficient model inference with omni-modality models, built on top of vLLM. It supports speech, image, video, audio, and multimodal generation, aligned with upstream vLLM releases. Current version is 0.20.0, with active development and monthly release cadence.

pip install vllm-omni
error ModuleNotFoundError: No module named 'vllm_omni'
cause Package name on PyPI is vllm-omni (with hyphen), but import uses vllm (no hyphen).
fix
Install with pip install vllm-omni and import from vllm (e.g., from vllm import LLM).
error ImportError: cannot import name 'LLM' from 'vllm_omni'
cause vllm_omni does not export LLM; it re-exports from vllm.
fix
Use from vllm import LLM (install vllm if needed).
error ValueError: The model is not supported by vLLM-Omni
cause Model type not yet implemented or requires trust_remote_code=True.
fix
Check model list in docs, or add trust_remote_code=True and ensure model is in supported list.
breaking vLLM-Omni versions must exactly match the upstream vLLM version they are built against. Using mismatched versions (e.g., vllm-omni 0.20.0 with vllm 0.19.0) will cause import errors or silent failures.
fix Install the exact matching vLLM version, typically vllm==<same_version>. E.g., pip install vllm==0.20.0 vllm-omni==0.20.0
gotcha Do not import from vllm_omni directly. All core classes (LLM, SamplingParams, etc.) are re-exported from vllm. Importing from vllm_omni will raise ImportError.
fix Use import from vllm (e.g., from vllm import LLM) instead of from vllm_omni.
deprecated The old entrypoint vllm.entrypoints.openai.api_server is deprecated. Use vllm serve command-line or vllm.entrypoints.openai.run_batch for batch inference.
fix Use `vllm serve` CLI or the new async engine API.

Basic inference with a multimodal model using vLLM's LLM interface. Ensure vllm is installed (pip install vllm).

from vllm import LLM, SamplingParams

# Load a multimodal model (e.g., Qwen2-VL)
llm = LLM(model="Qwen/Qwen2-VL-7B-Instruct", trust_remote_code=True)

# Generate with a multimodal prompt
prompt = {
    "role": "user",
    "content": [
        {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
        {"type": "text", "text": "Describe this image."}
    ]
}
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)