MinerU Vision-Language Utilities

0.2.3 · active · verified Thu Apr 16

mineru-vl-utils is a Python library providing utilities for interacting with MinerU Vision-Language models. It acts as a lightweight wrapper to simplify sending requests and handling responses from the MinerU VLM. The library is actively maintained, with frequent minor releases, and the current version is 0.2.3.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the `MinerUClient` using the `http-client` backend. It shows how to prepare a dummy PIL Image and hints at the interaction with a running MinerU VLM server. Note that for direct model inference backends like `transformers` or `vllm-engine`, you would pass the pre-loaded model and processor during client initialization.

import os
from PIL import Image
from mineru_vl_utils import MinerUClient

# For http-client backend, ensure a MinerU server is running at the specified URL.
# For local testing, you might run a server (e.g., using vllm with a MinerU model).
# Replace with your actual server URL if different.
server_url = os.environ.get('MINERU_SERVER_URL', 'http://127.0.0.1:8000')

# Initialize the client with the http-client backend
# Other backends (e.g., 'transformers', 'vllm-engine') require additional setup and dependencies.
client = MinerUClient(backend="http-client", server_url=server_url)

# Create a dummy image for demonstration (replace with your actual image loading)
try:
    image = Image.new('RGB', (60, 30), color = 'red')
    image_bytes = None # In a real scenario, load image bytes or path
    # Example: image = Image.open("path/to/your/image.jpg")
    # image.save("temp_image.png") # Save to a temp file if needed for client input

    # Assuming the client can take a PIL Image object directly or you convert to bytes/path
    # The actual API call might look slightly different based on specific model endpoint
    print(f"Attempting to send a dummy image to {server_url}...")
    # This is a simplified example; actual client methods might be like client.process_image(image)
    # The actual method often depends on the MinerU model's exposed API.
    # For this example, we'll simulate a call that would use an image input.
    # Check MinerU documentation for exact `two_step_extract` or similar method signature.
    
    # A more realistic quickstart often involves a hosted model or a fully configured local one.
    # As `mineru-vl-utils` is a wrapper, its usage depends on the backend selected.
    # For HTTP client, interaction is via server_url.
    # For direct model inference (transformers, vllm-engine), it involves passing the model/processor.
    
    # Example with a generic 'process' method if available:
    # result = client.process(image) 
    # print("Processed result:", result)
    print("MinerUClient initialized. To use it, you would call a method like client.two_step_extract(image) ")
    print("or client.async_process(image) depending on the backend and model capabilities.")

except Exception as e:
    print(f"An error occurred during quickstart: {e}")
    print("Please ensure a MinerU VLM server is running at the specified server_url for the 'http-client' backend.")

view raw JSON →