MinerU Vision-Language Utilities
mineru-vl-utils is a Python library providing utilities for interacting with MinerU Vision-Language models. It acts as a lightweight wrapper to simplify sending requests and handling responses from the MinerU VLM. The library is actively maintained, with frequent minor releases, and the current version is 0.2.3.
Common errors
-
ModuleNotFoundError: No module named 'mineru_vl_utils.backend.transformers'
cause The optional dependencies for a specific backend (e.g., `transformers`) were not installed.fixInstall the library with the required extras: `pip install "mineru-vl-utils[transformers]"` (or `[vllm]`, `[mlx]`, `[lmdeploy]` as needed). -
httpx.ConnectError: [Errno 111] Connect refused
cause The `http-client` backend was selected, but no MinerU VLM server is running at the specified `server_url`, or the URL is incorrect/inaccessible.fixEnsure a compatible MinerU VLM server is actively running and accessible at the `server_url` provided to `MinerUClient`. Verify the URL and port are correct. -
TypeError: 'Image.open(...)' object cannot be interpreted as a string path
cause Attempting to pass a PIL Image object directly to a function that expects a file path or bytes, or vice-versa, without proper conversion.fixCheck the specific client method's documentation for the expected input type (e.g., `PIL.Image.Image` object, image file path as string, or raw image bytes) and convert your input accordingly. For instance, `Image.open(path)` returns a PIL object, while some APIs might expect `open(path, 'rb').read()` for bytes.
Warnings
- gotcha The `transformers` backend is noted as slow and generally not suitable for production use cases. It's primarily for quick local testing and development.
- gotcha The `MinerUClient` from `mineru-vl-utils` is designed specifically for standalone image inputs. It does not natively support processing PDF, DOCX, or other multi-page document formats, nor does it handle cross-page or cross-document operations. For these advanced document parsing needs, refer to the main `MinerU` project/library.
- breaking With `mineru_vl_utils-0.2.3`, the default behavior for unknown `ref_type` in layout processing now defaults to `image`. This might subtly change how previously unhandled reference types are interpreted.
- gotcha When using the `vllm` backend with `MinerULogitsProcessor`, it requires `vllm>=0.10.1`. Older versions of `vllm` may lead to compatibility issues or missing features.
Install
-
pip install mineru-vl-utils -
pip install "mineru-vl-utils[transformers]" -
pip install "mineru-vl-utils[vllm]" -
pip install "mineru-vl-utils[mlx]" -
pip install "mineru-vl-utils[lmdeploy]"
Imports
- MinerUClient
from mineru_vl_utils import MinerUClient
- MinerULogitsProcessor
from mineru_vl_utils.logits_processor import MinerULogitsProcessor
from mineru_vl_utils import MinerULogitsProcessor
Quickstart
import os
from PIL import Image
from mineru_vl_utils import MinerUClient
# For http-client backend, ensure a MinerU server is running at the specified URL.
# For local testing, you might run a server (e.g., using vllm with a MinerU model).
# Replace with your actual server URL if different.
server_url = os.environ.get('MINERU_SERVER_URL', 'http://127.0.0.1:8000')
# Initialize the client with the http-client backend
# Other backends (e.g., 'transformers', 'vllm-engine') require additional setup and dependencies.
client = MinerUClient(backend="http-client", server_url=server_url)
# Create a dummy image for demonstration (replace with your actual image loading)
try:
image = Image.new('RGB', (60, 30), color = 'red')
image_bytes = None # In a real scenario, load image bytes or path
# Example: image = Image.open("path/to/your/image.jpg")
# image.save("temp_image.png") # Save to a temp file if needed for client input
# Assuming the client can take a PIL Image object directly or you convert to bytes/path
# The actual API call might look slightly different based on specific model endpoint
print(f"Attempting to send a dummy image to {server_url}...")
# This is a simplified example; actual client methods might be like client.process_image(image)
# The actual method often depends on the MinerU model's exposed API.
# For this example, we'll simulate a call that would use an image input.
# Check MinerU documentation for exact `two_step_extract` or similar method signature.
# A more realistic quickstart often involves a hosted model or a fully configured local one.
# As `mineru-vl-utils` is a wrapper, its usage depends on the backend selected.
# For HTTP client, interaction is via server_url.
# For direct model inference (transformers, vllm-engine), it involves passing the model/processor.
# Example with a generic 'process' method if available:
# result = client.process(image)
# print("Processed result:", result)
print("MinerUClient initialized. To use it, you would call a method like client.two_step_extract(image) ")
print("or client.async_process(image) depending on the backend and model capabilities.")
except Exception as e:
print(f"An error occurred during quickstart: {e}")
print("Please ensure a MinerU VLM server is running at the specified server_url for the 'http-client' backend.")