Qwen Vision Language Model Utilities
qwen-vl-utils is a Python utility library designed to simplify interaction with Qwen Vision Language Models (Qwen-VL) within PyTorch environments. It provides functionalities for loading Qwen-VL models and tokenizers, image preprocessing, and generating inferences. Currently at version 0.0.14, it is under active development, implying a potentially rapid release cadence with frequent updates.
Warnings
- breaking As a library in early development (version 0.0.x), `qwen-vl-utils` is subject to frequent and undocumented breaking changes in its API, function signatures, and internal behaviors. Backward compatibility is not guaranteed between minor or even patch releases.
- gotcha `qwen-vl-utils` itself does not include PyTorch with CUDA. For GPU acceleration, users MUST install a CUDA-enabled version of PyTorch separately, matching their CUDA toolkit version. Without it, operations will fall back to CPU, leading to significantly slower performance.
- gotcha The utility functions in `qwen-vl-utils` are designed to work with specific versions or architectures of Qwen-VL models. Incompatibility between the `qwen-vl-utils` library version and the loaded Qwen-VL model checkpoint can lead to errors during model loading or inference.
Install
-
pip install qwen-vl-utils
Imports
- load_model_and_tokenizer
from qwen_vl_utils.model import load_model_and_tokenizer
- gen_inference
from qwen_vl_utils.utils import gen_inference
Quickstart
import os
import torch # Required for device check and dtype
from qwen_vl_utils.model import load_model_and_tokenizer
from qwen_vl_utils.utils import gen_inference
# NOTE: This example requires a Qwen-VL model checkpoint.
# 1. Download a model, e.g., 'Qwen/Qwen-VL-Chat' from Hugging Face.
# 2. Set the environment variable QWEN_VL_MODEL_PATH to its local path.
# e.g., export QWEN_VL_MODEL_PATH="/path/to/Qwen-VL-Chat"
model_path = os.environ.get('QWEN_VL_MODEL_PATH', '')
if not model_path:
print("WARNING: Please set the QWEN_VL_MODEL_PATH environment variable with your model's local path.")
print("Skipping model loading and inference for quickstart.")
else:
try:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
# Load model and tokenizer
# For Qwen-VL-Chat models, torch.bfloat16 might be preferred for memory/performance if supported.
# model, tokenizer = load_model_and_tokenizer(model_path, device=device, torch_dtype=torch.bfloat16)
model, tokenizer = load_model_and_tokenizer(model_path, device=device)
print("Model and tokenizer loaded successfully.")
# Example query and image
query = "What objects are in this image?"
# Replace with a real image path (local or URL) for actual inference.
# For this example, we'll use a placeholder URL. Real execution requires a valid image.
image_input = "https://img.alicdn.com/imgextra/i3/O1CN01fQxAAx1hN0g3bM0d8_!!6000000004245-2-tps-1000-1000.png"
# Generate inference
print(f"Generating inference for query: '{query}' with image: {image_input}")
response = gen_inference(model, tokenizer, query, image_input)
print("\n--- Qwen-VL Inference Result ---")
print(response)
print("----------------------------------")
except Exception as e:
print(f"\nAn error occurred during quickstart execution: {e}")
print("Please ensure your model path is correct, PyTorch with CUDA is installed (if using GPU), and all dependencies are met.")