{"id":8324,"library":"mlx-vlm","title":"MLX-VLM","description":"MLX-VLM is a Python package for efficient inference and fine-tuning of Vision Language Models (VLMs) and Omni Models (VLMs with audio and video support) on Apple Silicon using the MLX framework. It provides access to various state-of-the-art multimodal models, often adding new models and optimizations with frequent releases. The current version is 0.4.4.","status":"active","version":"0.4.4","language":"en","source_language":"en","source_url":"https://github.com/Blaizzy/mlx-vlm","tags":["MLX","VLM","Vision Language Model","Apple Silicon","AI Inference","Multimodal AI"],"install":[{"cmd":"pip install mlx-vlm","lang":"bash","label":"Base installation"},{"cmd":"pip install 'mlx-vlm[vision]'","lang":"bash","label":"For most VLM models (includes Torch/Torchvision)"},{"cmd":"pip install 'mlx-vlm[omni]'","lang":"bash","label":"For Omni models with audio/video (includes Torch/Torchvision, requires ffmpeg)"}],"dependencies":[{"reason":"Core deep learning framework for Apple Silicon.","package":"mlx","optional":false},{"reason":"Required for [vision] and [omni] extras, used by certain model processors (e.g., torchvision).","package":"torch","optional":true},{"reason":"Required for [vision] and [omni] extras, used by certain model processors.","package":"torchvision","optional":true},{"reason":"Required system-wide for [omni] extra support for audio/video processing.","package":"ffmpeg","optional":true}],"imports":[{"symbol":"VLMModel","correct":"from mlx_vlm import VLMModel"},{"symbol":"VLMProcessor","correct":"from mlx_vlm import VLMProcessor"}],"quickstart":{"code":"import os\nfrom mlx_vlm import VLMModel, VLMProcessor\nfrom PIL import Image\nfrom pathlib import Path\n\n# Create a dummy image for the quickstart to be runnable\ndummy_image_path = Path(\"example_image.png\")\nif not dummy_image_path.exists():\n    Image.new('RGB', (100, 50), color = 'blue').save(dummy_image_path)\n\n# Use an environment variable for model path or default to a common VLM\nmodel_id = os.environ.get(\"MLX_VLM_MODEL\", \"mlx-community/Qwen-VL-Chat-mlx\")\n\ntry:\n    print(f\"Loading model: {model_id}...\")\n    # Make sure to install with 'mlx-vlm[vision]' if using a vision model\n    model, processor = VLMModel.from_pretrained(model_id)\n    print(\"Model loaded.\")\n\n    # Load the dummy image\n    image = Image.open(dummy_image_path)\n\n    # Prepare inputs\n    text_prompt = \"Describe this image in detail.\"\n    inputs = processor(text=text_prompt, images=[image])\n    print(f\"Prompt: {text_prompt}\")\n\n    # Generate response\n    output_tokens = model.generate(inputs, max_new_tokens=50)\n    response = processor.decode(output_tokens)\n    print(\"Generated response:\")\n    print(response)\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"\\nTroubleshooting Tips:\")\n    print(\"  1. Ensure you are on an Apple Silicon Mac.\")\n    print(\"  2. Install with appropriate extras: `pip install 'mlx-vlm[vision]'` or `pip install 'mlx-vlm[omni]'`.\")\n    print(\"  3. Check that the model_id is correct and supported by mlx-vlm.\")\n\nfinally:\n    # Clean up the dummy image\n    if dummy_image_path.exists():\n        dummy_image_path.unlink()\n","lang":"python","description":"This quickstart demonstrates how to load a pre-trained Vision Language Model (VLM) from Hugging Face using `mlx-vlm` and perform an image-to-text inference. It creates a dummy image, processes a text prompt and the image, and generates a descriptive response."},"warnings":[{"fix":"Ensure you are running your code on an Apple Silicon Mac.","message":"MLX-VLM is exclusively designed for Apple Silicon (macOS) and leverages the MLX framework. It will not function on other platforms such as Linux, Windows, or with NVIDIA/AMD GPUs.","severity":"gotcha","affected_versions":"All"},{"fix":"Identify the specific model's requirements and install `mlx-vlm` with the appropriate extras, e.g., `pip install 'mlx-vlm[vision]'`.","message":"Many VLM models require additional installation extras (e.g., `pip install 'mlx-vlm[vision]'` or `'mlx-vlm[omni]'`). These extras bring in dependencies like `torch` and `torchvision`. Failing to install the correct extras can lead to `ModuleNotFoundError` or other runtime errors during model loading or processing.","severity":"gotcha","affected_versions":"All"},{"fix":"Regularly consult the official GitHub repository for release notes and changes. For production environments, pin your `mlx-vlm` version to a specific minor release to ensure stability, e.g., `mlx-vlm==0.4.*`.","message":"The `mlx-vlm` library is under very active and rapid development. APIs, particularly for model loading, processing, and inference parameters, can change quickly between minor versions. This may necessitate code adjustments when upgrading.","severity":"gotcha","affected_versions":"All 0.x.x versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure `mlx-vlm` (which usually brings `mlx` as a dependency) is installed correctly with `pip install mlx-vlm`.","cause":"The core MLX framework, which mlx-vlm depends on, was not installed or is not accessible in the current Python environment.","error":"ModuleNotFoundError: No module named 'mlx'"},{"fix":"MLX-VLM is designed for Apple Silicon. Run your code on a compatible Mac.","cause":"Attempting to run `mlx-vlm` on a non-Apple Silicon machine, or MLX is not correctly configured for the environment.","error":"RuntimeError: The MLX backend is not available. Please ensure you are running on an Apple Silicon Mac."},{"fix":"Install `mlx-vlm` with the relevant extras: `pip install 'mlx-vlm[vision]'` for most VLM models or `pip install 'mlx-vlm[omni]'` for omni models.","cause":"This typically occurs when `mlx-vlm` is installed without the necessary `[vision]` or `[omni]` extras, which would install `torch` and `torchvision` (or update them to compatible versions).","error":"AttributeError: module 'torchvision.transforms' has no attribute 'InterpolationMode'"},{"fix":"Double-check the `model_id` on Hugging Face to ensure it's valid and a VLM. Refer to the `mlx-vlm` GitHub repository for a list of officially supported models and their expected `model_id`s.","cause":"The `model_id` provided to `VLMModel.from_pretrained` is either incorrect, points to a model not currently supported by `mlx-vlm`, or the model's configuration file (`config.json`) is missing critical vision-related parameters.","error":"ValueError: Unknown model type... or KeyError: 'vision_config'"}]}