GROBID client Python
raw JSON → 0.1.4 verified Fri May 01 auth: no python
Simple Python client for GROBID REST services. Current version: 0.1.4. Release cadence: irregular, with several fixes in 2024-2025.
pip install grobid-client-python Common errors
error ModuleNotFoundError: No module named 'grobid_client' ↓
cause The package is installed as 'grobid-client-python', but the import path uses underscore. Mistaking package name for import name.
fix
Install with 'pip install grobid-client-python', then import correctly: from grobid_client.grobid_client import GrobidClient
error AttributeError: module 'grobid_client' has no attribute 'GrobidClient' ↓
cause Trying to import GrobidClient from the top-level grobid_client package without specifying the submodule.
fix
Use: from grobid_client.grobid_client import GrobidClient
error ConnectionError: HTTPConnectionPool(host='localhost', port=8070): Max retries exceeded ↓
cause No GROBID server running at the default URL.
fix
Start GROBID server or provide a different grobid_server URL pointing to a running instance.
error ValueError: The 'input' parameter must be a file path or a directory. ↓
cause Passing a non-existent path or invalid file type (not PDF or XML).
fix
Ensure input exists and is a valid PDF or TEI XML file, or include a directory containing such files.
Warnings
gotcha The client expects a running GROBID server at the specified URL. Without it, all API calls will raise ConnectionError. ↓
fix Ensure GROBID server is running at the configured grobid_server URL (default: http://localhost:8070).
breaking Version 0.1.0 changed the default output format to JSON and Markdown. The process() method now returns dicts and writes files differently. Old code expecting raw TEI XML may break. ↓
fix Use generateIDs=True and specify output format parameters like output_format='tei' if needed.
gotcha The batch size default changed to 10 in v0.0.17 to avoid unexpected behaviors. Large batches may cause server timeouts. ↓
fix Adjust batch_size parameter in client initialization (e.g., GrobidClient(batch_size=100) for larger throughput, but test for stability).
gotcha The client uses synchronous requests. Processing many PDFs can block the calling thread for a long time. ↓
fix Consider using threading or asyncio wrappers if concurrent processing is needed.
Imports
- GrobidClient wrong
from grobid_client import GrobidClientcorrectfrom grobid_client.grobid_client import GrobidClient - GrobidClient wrong
import grobid_clientcorrectfrom grobid_client import GrobidClient
Quickstart
from grobid_client.grobid_client import GrobidClient
client = GrobidClient(config_path=None, grobid_server='http://localhost:8070')
client.process("processFulltextDocument", "input.pdf", output="output/")
print("Done")