MLX-VLM
MLX-VLM is a Python package for efficient inference and fine-tuning of Vision Language Models (VLMs) and Omni Models (VLMs with audio and video support) on Apple Silicon using the MLX framework. It provides access to various state-of-the-art multimodal models, often adding new models and optimizations with frequent releases. The current version is 0.4.4.
Common errors
-
ModuleNotFoundError: No module named 'mlx'
cause The core MLX framework, which mlx-vlm depends on, was not installed or is not accessible in the current Python environment.fixEnsure `mlx-vlm` (which usually brings `mlx` as a dependency) is installed correctly with `pip install mlx-vlm`. -
RuntimeError: The MLX backend is not available. Please ensure you are running on an Apple Silicon Mac.
cause Attempting to run `mlx-vlm` on a non-Apple Silicon machine, or MLX is not correctly configured for the environment.fixMLX-VLM is designed for Apple Silicon. Run your code on a compatible Mac. -
AttributeError: module 'torchvision.transforms' has no attribute 'InterpolationMode'
cause This typically occurs when `mlx-vlm` is installed without the necessary `[vision]` or `[omni]` extras, which would install `torch` and `torchvision` (or update them to compatible versions).fixInstall `mlx-vlm` with the relevant extras: `pip install 'mlx-vlm[vision]'` for most VLM models or `pip install 'mlx-vlm[omni]'` for omni models. -
ValueError: Unknown model type... or KeyError: 'vision_config'
cause The `model_id` provided to `VLMModel.from_pretrained` is either incorrect, points to a model not currently supported by `mlx-vlm`, or the model's configuration file (`config.json`) is missing critical vision-related parameters.fixDouble-check the `model_id` on Hugging Face to ensure it's valid and a VLM. Refer to the `mlx-vlm` GitHub repository for a list of officially supported models and their expected `model_id`s.
Warnings
- gotcha MLX-VLM is exclusively designed for Apple Silicon (macOS) and leverages the MLX framework. It will not function on other platforms such as Linux, Windows, or with NVIDIA/AMD GPUs.
- gotcha Many VLM models require additional installation extras (e.g., `pip install 'mlx-vlm[vision]'` or `'mlx-vlm[omni]'`). These extras bring in dependencies like `torch` and `torchvision`. Failing to install the correct extras can lead to `ModuleNotFoundError` or other runtime errors during model loading or processing.
- gotcha The `mlx-vlm` library is under very active and rapid development. APIs, particularly for model loading, processing, and inference parameters, can change quickly between minor versions. This may necessitate code adjustments when upgrading.
Install
-
pip install mlx-vlm -
pip install 'mlx-vlm[vision]' -
pip install 'mlx-vlm[omni]'
Imports
- VLMModel
from mlx_vlm import VLMModel
- VLMProcessor
from mlx_vlm import VLMProcessor
Quickstart
import os
from mlx_vlm import VLMModel, VLMProcessor
from PIL import Image
from pathlib import Path
# Create a dummy image for the quickstart to be runnable
dummy_image_path = Path("example_image.png")
if not dummy_image_path.exists():
Image.new('RGB', (100, 50), color = 'blue').save(dummy_image_path)
# Use an environment variable for model path or default to a common VLM
model_id = os.environ.get("MLX_VLM_MODEL", "mlx-community/Qwen-VL-Chat-mlx")
try:
print(f"Loading model: {model_id}...")
# Make sure to install with 'mlx-vlm[vision]' if using a vision model
model, processor = VLMModel.from_pretrained(model_id)
print("Model loaded.")
# Load the dummy image
image = Image.open(dummy_image_path)
# Prepare inputs
text_prompt = "Describe this image in detail."
inputs = processor(text=text_prompt, images=[image])
print(f"Prompt: {text_prompt}")
# Generate response
output_tokens = model.generate(inputs, max_new_tokens=50)
response = processor.decode(output_tokens)
print("Generated response:")
print(response)
except Exception as e:
print(f"An error occurred: {e}")
print("\nTroubleshooting Tips:")
print(" 1. Ensure you are on an Apple Silicon Mac.")
print(" 2. Install with appropriate extras: `pip install 'mlx-vlm[vision]'` or `pip install 'mlx-vlm[omni]'`.")
print(" 3. Check that the model_id is correct and supported by mlx-vlm.")
finally:
# Clean up the dummy image
if dummy_image_path.exists():
dummy_image_path.unlink()