{"id":8549,"library":"python-doctr","title":"Document Text Recognition (docTR)","description":"docTR (Document Text Recognition) is an open-source Python library leveraging deep learning for high-performance Optical Character Recognition (OCR) on documents. It provides state-of-the-art text detection and recognition for scanned documents, images, and PDFs. Actively maintained by Mindee, it supports multi-language recognition, handwriting, and GPU acceleration, currently at version 1.0.1.","status":"active","version":"1.0.1","language":"en","source_language":"en","source_url":"https://github.com/mindee/doctr","tags":["OCR","Deep Learning","Document Processing","Computer Vision","PyTorch","AI/ML"],"install":[{"cmd":"pip install python-doctr","lang":"bash","label":"Base installation"},{"cmd":"pip install \"python-doctr[viz,html,contrib]\"","lang":"bash","label":"Full installation with optional features (visualization, HTML, contribution models)"},{"cmd":"pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118\npip install python-doctr","lang":"bash","label":"Installation with PyTorch (GPU CUDA 11.8 support, adjust for your CUDA version)"}],"dependencies":[{"reason":"Deep learning backend (default and only one since v1.0.0)","package":"torch","optional":false},{"reason":"Required for GPU acceleration with PyTorch","package":"torchvision","optional":true},{"reason":"Optional dependency for HTML document processing and visualization utilities (part of `viz` and `html` extras)","package":"weasyprint","optional":true},{"reason":"PDF processing backend (required for DocumentFile.from_pdf)","package":"pypdfium2","optional":false}],"imports":[{"symbol":"DocumentFile","correct":"from doctr.io import DocumentFile"},{"symbol":"ocr_predictor","correct":"from doctr.models import ocr_predictor"},{"note":"from_hub for Hugging Face models is directly under doctr.models now, not a submodule 'pre_trained'.","wrong":"from doctr.models.pre_trained import from_hub","symbol":"from_hub","correct":"from doctr.models import from_hub"}],"quickstart":{"code":"import os\nfrom doctr.io import DocumentFile\nfrom doctr.models import ocr_predictor\n\n# For demonstration, create a dummy image file if it doesn't exist\n# In a real scenario, you'd have an actual image or PDF path\ndummy_image_path = 'sample.png'\nif not os.path.exists(dummy_image_path):\n    try:\n        from PIL import Image\n        # Create a simple image with text\n        img = Image.new('RGB', (200, 100), color = (255, 255, 255))\n        from PIL import ImageDraw, ImageFont\n        d = ImageDraw.Draw(img)\n        try:\n            # Try a common font, or fallback\n            font = ImageFont.truetype(\"arial.ttf\", 20)\n        except IOError:\n            font = ImageFont.load_default()\n        d.text((10,10), \"Hello docTR!\", fill=(0,0,0), font=font)\n        img.save(dummy_image_path)\n        print(f\"Created dummy image: {dummy_image_path}\")\n    except ImportError:\n        print(\"Pillow not installed, cannot create dummy image. Please provide a real image file.\")\n        print(\"Skipping quickstart example as no image is available.\")\n        dummy_image_path = None\n\nif dummy_image_path and os.path.exists(dummy_image_path):\n    # Load your document (image or PDF)\n    # For a PDF: doc = DocumentFile.from_pdf(\"path/to/your/document.pdf\")\n    # For multiple images: doc = DocumentFile.from_images([\"path/to/img1.jpg\", \"path/to/img2.png\"])\n    doc = DocumentFile.from_images(dummy_image_path)\n\n    # Load a pre-trained OCR model\n    # Since v1.0.0, PyTorch is the default and only backend.\n    model = ocr_predictor(pretrained=True)\n\n    # Analyze the document\n    result = model(doc)\n\n    # Print the extracted text content\n    # The result object contains detailed information about words, lines, blocks, and pages.\n    print(\"\\n--- OCR Result ---\")\n    for page in result.pages:\n        for block in page.blocks:\n            for line in block.lines:\n                print(\" \".join([word.value for word in line.words]))\n\n    # You can also export the full structured output as JSON\n    # print(result.export())\nelse:\n    print(\"Quickstart skipped due to missing image.\")\n","lang":"python","description":"This quickstart demonstrates how to load an image, initialize a pre-trained OCR model, and extract text using docTR's core functionality. It leverages `DocumentFile` to handle input and `ocr_predictor` for the end-to-end OCR pipeline."},"warnings":[{"fix":"Ensure you have PyTorch installed (`pip install torch torchvision`) and remove any TensorFlow-specific code or installations related to docTR. The base `pip install python-doctr` will now install with PyTorch support by default.","message":"docTR v1.0.0 removed TensorFlow as a supported backend. The library now exclusively uses PyTorch. Old `python-doctr[tf]` installation options are no longer valid, and training scripts have been updated.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"Install `weasyprint`'s system dependencies manually for your OS (e.g., `sudo apt-get install -y libgl1-mesa-glx libglib2.0-0 libpango-1.0-0 libpangoft2-1.0-0` on Ubuntu/Debian). For specific `weasyprint` errors like `OSError: cannot load library 'gobject-2.0-0'`, refer to `weasyprint`'s documentation.","message":"Processing PDFs or HTML documents with `DocumentFile.from_pdf` or `DocumentFile.from_url` (via `html` extra) often relies on `weasyprint`, which itself has system-level dependencies (e.g., `libglib2.0-0`, `libpango-1.0-0` on Linux) that are not automatically installed by `pip`.","severity":"gotcha","affected_versions":"All versions using `weasyprint`"},{"fix":"Follow PyTorch's official installation guide to install `torch` and `torchvision` with appropriate CUDA versions (e.g., `pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118`) *before* installing `python-doctr` or its `[torch]` extra.","message":"GPU acceleration requires manually installing `torch` and `torchvision` with CUDA support, which `pip install python-doctr` does not automatically handle to keep the base package lightweight.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the required system packages. For Debian/Ubuntu: `sudo apt-get install -y libglib2.0-0 libpango-1.0-0 libpangoft2-1.0-0`. Other Linux distributions, macOS, or Windows will have different prerequisites for `weasyprint`.","cause":"Missing system-level dependencies for `weasyprint`, which is used by docTR's `html` and `viz` extras for PDF/HTML processing.","error":"OSError: cannot load library 'gobject-2.0-0'"},{"fix":"Ensure `python-doctr` is installed in your active environment: `pip install python-doctr`. If in an IDE like PyCharm, verify the correct Python interpreter is selected for your project.","cause":"The `python-doctr` library is either not installed, or the Python interpreter in use does not have access to the installed package (e.g., wrong virtual environment).","error":"ModuleNotFoundError: No module named 'doctr.io'"},{"fix":"Temporarily disable SSL verification for Git: `git config --global http.sslVerify false` *before* cloning. Remember to re-enable it afterwards: `git config --global http.sslVerify true` for security.","cause":"Corporate proxies or misconfigured Git installations can block secure connections (SSL/TLS) when cloning repositories or fetching packages.","error":"git clone ... then pip install -e doctr/ fails due to SSL certificate verification issues."}]}