{"id":9111,"library":"mineru-vl-utils","title":"MinerU Vision-Language Utilities","description":"mineru-vl-utils is a Python library providing utilities for interacting with MinerU Vision-Language models. It acts as a lightweight wrapper to simplify sending requests and handling responses from the MinerU VLM. The library is actively maintained, with frequent minor releases, and the current version is 0.2.3.","status":"active","version":"0.2.3","language":"en","source_language":"en","source_url":"https://github.com/opendatalab/mineru-vl-utils","tags":["vision","language","nlp","multimodal","utilities","document-parsing","ocr"],"install":[{"cmd":"pip install mineru-vl-utils","lang":"bash","label":"Base Installation"},{"cmd":"pip install \"mineru-vl-utils[transformers]\"","lang":"bash","label":"For Transformers backend"},{"cmd":"pip install \"mineru-vl-utils[vllm]\"","lang":"bash","label":"For VLLM engine backends"},{"cmd":"pip install \"mineru-vl-utils[mlx]\"","lang":"bash","label":"For MLX engine backend (Apple Silicon)"},{"cmd":"pip install \"mineru-vl-utils[lmdeploy]\"","lang":"bash","label":"For LmDeploy engine backend"}],"dependencies":[{"reason":"Image processing","package":"pillow","optional":false},{"reason":"HTTP client functionality","package":"httpx","optional":false},{"reason":"Data validation and settings management","package":"pydantic","optional":false},{"reason":"Optional backend for HuggingFace models","package":"transformers","optional":true},{"reason":"Optional backend for VLLM engine","package":"vllm","optional":true},{"reason":"Optional backend for MLX engine (Apple Silicon)","package":"mlx-vlm","optional":true},{"reason":"Optional backend for LmDeploy engine","package":"lmdeploy","optional":true}],"imports":[{"symbol":"MinerUClient","correct":"from mineru_vl_utils import MinerUClient"},{"note":"MinerULogitsProcessor is directly importable from the top-level package since v0.1.20+.","wrong":"from mineru_vl_utils.logits_processor import MinerULogitsProcessor","symbol":"MinerULogitsProcessor","correct":"from mineru_vl_utils import MinerULogitsProcessor"}],"quickstart":{"code":"import os\nfrom PIL import Image\nfrom mineru_vl_utils import MinerUClient\n\n# For http-client backend, ensure a MinerU server is running at the specified URL.\n# For local testing, you might run a server (e.g., using vllm with a MinerU model).\n# Replace with your actual server URL if different.\nserver_url = os.environ.get('MINERU_SERVER_URL', 'http://127.0.0.1:8000')\n\n# Initialize the client with the http-client backend\n# Other backends (e.g., 'transformers', 'vllm-engine') require additional setup and dependencies.\nclient = MinerUClient(backend=\"http-client\", server_url=server_url)\n\n# Create a dummy image for demonstration (replace with your actual image loading)\ntry:\n    image = Image.new('RGB', (60, 30), color = 'red')\n    image_bytes = None # In a real scenario, load image bytes or path\n    # Example: image = Image.open(\"path/to/your/image.jpg\")\n    # image.save(\"temp_image.png\") # Save to a temp file if needed for client input\n\n    # Assuming the client can take a PIL Image object directly or you convert to bytes/path\n    # The actual API call might look slightly different based on specific model endpoint\n    print(f\"Attempting to send a dummy image to {server_url}...\")\n    # This is a simplified example; actual client methods might be like client.process_image(image)\n    # The actual method often depends on the MinerU model's exposed API.\n    # For this example, we'll simulate a call that would use an image input.\n    # Check MinerU documentation for exact `two_step_extract` or similar method signature.\n    \n    # A more realistic quickstart often involves a hosted model or a fully configured local one.\n    # As `mineru-vl-utils` is a wrapper, its usage depends on the backend selected.\n    # For HTTP client, interaction is via server_url.\n    # For direct model inference (transformers, vllm-engine), it involves passing the model/processor.\n    \n    # Example with a generic 'process' method if available:\n    # result = client.process(image) \n    # print(\"Processed result:\", result)\n    print(\"MinerUClient initialized. To use it, you would call a method like client.two_step_extract(image) \")\n    print(\"or client.async_process(image) depending on the backend and model capabilities.\")\n\nexcept Exception as e:\n    print(f\"An error occurred during quickstart: {e}\")\n    print(\"Please ensure a MinerU VLM server is running at the specified server_url for the 'http-client' backend.\")","lang":"python","description":"This quickstart demonstrates how to initialize the `MinerUClient` using the `http-client` backend. It shows how to prepare a dummy PIL Image and hints at the interaction with a running MinerU VLM server. Note that for direct model inference backends like `transformers` or `vllm-engine`, you would pass the pre-loaded model and processor during client initialization."},"warnings":[{"fix":"For production or performance-critical applications, consider using backends like `vllm-engine`, `vllm-async-engine`, `mlx-engine`, or the `http-client` with a dedicated server.","message":"The `transformers` backend is noted as slow and generally not suitable for production use cases. It's primarily for quick local testing and development.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure inputs are single images (e.g., PIL Image objects). For PDF/DOCX or complex document parsing, use the full `MinerU` toolkit.","message":"The `MinerUClient` from `mineru-vl-utils` is designed specifically for standalone image inputs. It does not natively support processing PDF, DOCX, or other multi-page document formats, nor does it handle cross-page or cross-document operations. For these advanced document parsing needs, refer to the main `MinerU` project/library.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review any code that relies on implicit handling of unknown `ref_type`s or expects an error for them. Explicitly set `ref_type` if specific behavior is required.","message":"With `mineru_vl_utils-0.2.3`, the default behavior for unknown `ref_type` in layout processing now defaults to `image`. This might subtly change how previously unhandled reference types are interpreted.","severity":"breaking","affected_versions":">=0.2.3"},{"fix":"Ensure your `vllm` installation is at least version `0.10.1` when using `MinerULogitsProcessor`. Update `pip install vllm>=0.10.1`.","message":"When using the `vllm` backend with `MinerULogitsProcessor`, it requires `vllm>=0.10.1`. Older versions of `vllm` may lead to compatibility issues or missing features.","severity":"gotcha","affected_versions":"<0.10.1 of vllm"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the library with the required extras: `pip install \"mineru-vl-utils[transformers]\"` (or `[vllm]`, `[mlx]`, `[lmdeploy]` as needed).","cause":"The optional dependencies for a specific backend (e.g., `transformers`) were not installed.","error":"ModuleNotFoundError: No module named 'mineru_vl_utils.backend.transformers'"},{"fix":"Ensure a compatible MinerU VLM server is actively running and accessible at the `server_url` provided to `MinerUClient`. Verify the URL and port are correct.","cause":"The `http-client` backend was selected, but no MinerU VLM server is running at the specified `server_url`, or the URL is incorrect/inaccessible.","error":"httpx.ConnectError: [Errno 111] Connect refused"},{"fix":"Check the specific client method's documentation for the expected input type (e.g., `PIL.Image.Image` object, image file path as string, or raw image bytes) and convert your input accordingly. For instance, `Image.open(path)` returns a PIL object, while some APIs might expect `open(path, 'rb').read()` for bytes.","cause":"Attempting to pass a PIL Image object directly to a function that expects a file path or bytes, or vice-versa, without proper conversion.","error":"TypeError: 'Image.open(...)' object cannot be interpreted as a string path"}]}