vLLM-Omni

0.20.0 verified Sat May 09 auth: no python

vLLM-Omni is a framework for efficient model inference with omni-modality models, built on top of vLLM. It supports speech, image, video, audio, and multimodal generation, aligned with upstream vLLM releases. Current version is 0.20.0, with active development and monthly release cadence.

pip install vllm-omni

Common errors

error ModuleNotFoundError: No module named 'vllm_omni' ↓

cause Package name on PyPI is vllm-omni (with hyphen), but import uses vllm (no hyphen).

fix

Install with pip install vllm-omni and import from vllm (e.g., from vllm import LLM).

error ImportError: cannot import name 'LLM' from 'vllm_omni' ↓

cause vllm_omni does not export LLM; it re-exports from vllm.

fix

Use from vllm import LLM (install vllm if needed).

error ValueError: The model is not supported by vLLM-Omni ↓

cause Model type not yet implemented or requires trust_remote_code=True.

fix

Check model list in docs, or add trust_remote_code=True and ensure model is in supported list.

Warnings

breaking vLLM-Omni versions must exactly match the upstream vLLM version they are built against. Using mismatched versions (e.g., vllm-omni 0.20.0 with vllm 0.19.0) will cause import errors or silent failures. ↓

fix Install the exact matching vLLM version, typically vllm==<same_version>. E.g., pip install vllm==0.20.0 vllm-omni==0.20.0

gotcha Do not import from vllm_omni directly. All core classes (LLM, SamplingParams, etc.) are re-exported from vllm. Importing from vllm_omni will raise ImportError. ↓

fix Use import from vllm (e.g., from vllm import LLM) instead of from vllm_omni.

deprecated The old entrypoint vllm.entrypoints.openai.api_server is deprecated. Use vllm serve command-line or vllm.entrypoints.openai.run_batch for batch inference. ↓

fix Use `vllm serve` CLI or the new async engine API.

Imports

LLM
wrong
```
from vllm_omni import LLM
```
correct
```
from vllm import LLM
```
LLM is re-exported from vLLM, not from vllm-omni directly

SamplingParams

wrong

from vllm_omni import SamplingParams

correct

from vllm import SamplingParams

Same as LLM, import from vLLM core

AsyncLLMEngine

from vllm.engine.async_llm_engine import AsyncLLMEngine

From vLLM, not exposed in vllm-omni

Quickstart

Basic inference with a multimodal model using vLLM's LLM interface. Ensure vllm is installed (pip install vllm).

from vllm import LLM, SamplingParams

# Load a multimodal model (e.g., Qwen2-VL)
llm = LLM(model="Qwen/Qwen2-VL-7B-Instruct", trust_remote_code=True)

# Generate with a multimodal prompt
prompt = {
    "role": "user",
    "content": [
        {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
        {"type": "text", "text": "Describe this image."}
    ]
}
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)