MLX-VLM

0.4.4 · active · verified Thu Apr 16

MLX-VLM is a Python package for efficient inference and fine-tuning of Vision Language Models (VLMs) and Omni Models (VLMs with audio and video support) on Apple Silicon using the MLX framework. It provides access to various state-of-the-art multimodal models, often adding new models and optimizations with frequent releases. The current version is 0.4.4.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a pre-trained Vision Language Model (VLM) from Hugging Face using `mlx-vlm` and perform an image-to-text inference. It creates a dummy image, processes a text prompt and the image, and generates a descriptive response.

import os
from mlx_vlm import VLMModel, VLMProcessor
from PIL import Image
from pathlib import Path

# Create a dummy image for the quickstart to be runnable
dummy_image_path = Path("example_image.png")
if not dummy_image_path.exists():
    Image.new('RGB', (100, 50), color = 'blue').save(dummy_image_path)

# Use an environment variable for model path or default to a common VLM
model_id = os.environ.get("MLX_VLM_MODEL", "mlx-community/Qwen-VL-Chat-mlx")

try:
    print(f"Loading model: {model_id}...")
    # Make sure to install with 'mlx-vlm[vision]' if using a vision model
    model, processor = VLMModel.from_pretrained(model_id)
    print("Model loaded.")

    # Load the dummy image
    image = Image.open(dummy_image_path)

    # Prepare inputs
    text_prompt = "Describe this image in detail."
    inputs = processor(text=text_prompt, images=[image])
    print(f"Prompt: {text_prompt}")

    # Generate response
    output_tokens = model.generate(inputs, max_new_tokens=50)
    response = processor.decode(output_tokens)
    print("Generated response:")
    print(response)

except Exception as e:
    print(f"An error occurred: {e}")
    print("\nTroubleshooting Tips:")
    print("  1. Ensure you are on an Apple Silicon Mac.")
    print("  2. Install with appropriate extras: `pip install 'mlx-vlm[vision]'` or `pip install 'mlx-vlm[omni]'`.")
    print("  3. Check that the model_id is correct and supported by mlx-vlm.")

finally:
    # Clean up the dummy image
    if dummy_image_path.exists():
        dummy_image_path.unlink()

view raw JSON →