Qwen Vision Language Model Utilities

0.0.14 · active · verified Fri Apr 10

qwen-vl-utils is a Python utility library designed to simplify interaction with Qwen Vision Language Models (Qwen-VL) within PyTorch environments. It provides functionalities for loading Qwen-VL models and tokenizers, image preprocessing, and generating inferences. Currently at version 0.0.14, it is under active development, implying a potentially rapid release cadence with frequent updates.

Warnings

Install

Imports

Quickstart

Demonstrates how to load a Qwen-VL model and its tokenizer, then use them to perform visual question answering with a given image and text query. It includes safeguards for model path availability and device selection (CPU/CUDA).

import os
import torch # Required for device check and dtype
from qwen_vl_utils.model import load_model_and_tokenizer
from qwen_vl_utils.utils import gen_inference

# NOTE: This example requires a Qwen-VL model checkpoint.
# 1. Download a model, e.g., 'Qwen/Qwen-VL-Chat' from Hugging Face.
# 2. Set the environment variable QWEN_VL_MODEL_PATH to its local path.
#    e.g., export QWEN_VL_MODEL_PATH="/path/to/Qwen-VL-Chat"

model_path = os.environ.get('QWEN_VL_MODEL_PATH', '')

if not model_path:
    print("WARNING: Please set the QWEN_VL_MODEL_PATH environment variable with your model's local path.")
    print("Skipping model loading and inference for quickstart.")
else:
    try:
        device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"Using device: {device}")

        # Load model and tokenizer
        # For Qwen-VL-Chat models, torch.bfloat16 might be preferred for memory/performance if supported.
        # model, tokenizer = load_model_and_tokenizer(model_path, device=device, torch_dtype=torch.bfloat16)
        model, tokenizer = load_model_and_tokenizer(model_path, device=device)
        print("Model and tokenizer loaded successfully.")

        # Example query and image
        query = "What objects are in this image?"
        # Replace with a real image path (local or URL) for actual inference.
        # For this example, we'll use a placeholder URL. Real execution requires a valid image.
        image_input = "https://img.alicdn.com/imgextra/i3/O1CN01fQxAAx1hN0g3bM0d8_!!6000000004245-2-tps-1000-1000.png"

        # Generate inference
        print(f"Generating inference for query: '{query}' with image: {image_input}")
        response = gen_inference(model, tokenizer, query, image_input)

        print("\n--- Qwen-VL Inference Result ---")
        print(response)
        print("----------------------------------")

    except Exception as e:
        print(f"\nAn error occurred during quickstart execution: {e}")
        print("Please ensure your model path is correct, PyTorch with CUDA is installed (if using GPU), and all dependencies are met.")

view raw JSON →