Protego
raw JSON → 0.6.0 verified Tue May 12 auth: no python install: verified
Protego is a pure-Python robots.txt parser with support for modern conventions like those defined by Google. As of version 0.6.0, it actively supports Python 3.10 and newer, with regular updates aligning with new Python releases. It is widely used for web scraping and compliance checking.
pip install protego Common errors
error ModuleNotFoundError: No module named 'protego' ↓
cause The `protego` library has not been installed in the current Python environment.
fix
Install the library using pip:
pip install protego error ValueError: content is not a string ↓
cause The `Protego.parse()` method expects its input (`robotstxt_body`) to be a string, but it received a non-string type (e.g., bytes). This behavior was explicitly introduced in Protego 0.6.0 and newer versions.
fix
Ensure that the
robotstxt_body passed to Protego.parse() is decoded into a string (e.g., UTF-8) before parsing. Example: Protego.parse(response.text) if using requests, or Protego.parse(robotstxt_bytes.decode('utf-8')). error AttributeError: module 'protego' has no attribute 'parse' ↓
cause This error occurs when attempting to call `parse()` directly on the `protego` module (e.g., `protego.parse(...)`) instead of on an instance of the `Protego` class. The `parse` method is a static method of the `Protego` class.
fix
Import the
Protego class and call parse() as a static method on the class: from protego import Protego then robots = Protego.parse(robots_txt_content). Warnings
breaking Version 0.6.0 dropped official support for Python 3.9 and PyPy 3.10. Users on these Python versions should use Protego 0.5.x or upgrade their Python environment. ↓
fix Upgrade to Python 3.10+ or pin Protego to a version <0.6.0 (e.g., `protego<0.6.0`).
breaking Version 0.4.0 dropped official support for Python 3.8. ↓
fix Upgrade to Python 3.9+ or pin Protego to a version <0.4.0.
breaking Version 0.3.0 dropped support for Python 2.7, 3.5, 3.6, and 3.7. The `six` dependency was also removed in this version, making it Python 3 only. ↓
fix Upgrade to Python 3.8+ and remove `six` if it was only a Protego dependency. Pin Protego to a version <0.3.0 for older Python environments.
gotcha In Protego 0.3.0 and later, `Protego.parse()` will raise a `ValueError` if the `robotstxt_body` argument is not a string. ↓
fix Ensure that the `robotstxt_body` passed to `Protego.parse()` is always a string.
gotcha Version 0.1.16 fixed an issue where absolute URLs in `Allow` and `Disallow` directives were incorrectly parsed, ignoring their protocol and netloc. Older versions might misinterpret these directives, leading to incorrect access decisions. ↓
fix Upgrade to Protego 0.1.16 or newer to ensure correct interpretation of absolute URLs in `robots.txt` directives.
gotcha Version 0.5.0 restructured the internal code from a single `protego.py` file into multiple modules. While the public API `from protego import Protego` remains stable, direct imports of internal modules (if any were used) would have broken. ↓
fix Avoid importing internal modules; rely only on the documented public API (e.g., `from protego import Protego`).
Install compatibility verified last tested: 2026-05-12 v0.5.0 installed · v0.6.0 latest
python os / libc status wheel install import disk mem side effects
3.10 alpine (musl) wheel - 0.03s 17.8M 1.8M clean
3.10 alpine (musl) - - 0.03s 17.8M 1.8M -
3.10 slim (glibc) wheel 1.4s 0.02s 18M 1.8M clean
3.10 slim (glibc) - - 0.02s 18M 1.8M -
3.11 alpine (musl) wheel - 0.07s 19.7M 2.2M clean
3.11 alpine (musl) - - 0.08s 19.7M 2.2M -
3.11 slim (glibc) wheel 1.6s 0.06s 20M 2.2M clean
3.11 slim (glibc) - - 0.06s 20M 2.2M -
3.12 alpine (musl) wheel - 0.05s 11.6M 1.8M clean
3.12 alpine (musl) - - 0.05s 11.6M 1.8M -
3.12 slim (glibc) wheel 1.4s 0.05s 12M 1.8M clean
3.12 slim (glibc) - - 0.07s 12M 1.8M -
3.13 alpine (musl) wheel - 0.05s 11.3M 2.1M clean
3.13 alpine (musl) - - 0.05s 11.2M 2.1M -
3.13 slim (glibc) wheel 1.4s 0.05s 12M 1.9M clean
3.13 slim (glibc) - - 0.05s 12M 1.9M -
3.9 alpine (musl) wheel - 0.03s 17.3M 1.8M clean
3.9 alpine (musl) - - 0.04s 17.3M 1.8M -
3.9 slim (glibc) wheel 1.7s 0.03s 18M 1.8M clean
3.9 slim (glibc) - - 0.03s 18M 1.8M -
Imports
- Protego
from protego import Protego
Quickstart last tested: 2026-04-24
from protego import Protego
robotstxt_content = """
User-agent: *
Disallow: /admin/
Allow: /admin/login
Crawl-delay: 5
Sitemap: http://example.com/sitemap.xml
"""
rp = Protego.parse(robotstxt_content)
# Check if a URL can be fetched by a user agent
can_fetch_admin = rp.can_fetch("http://example.com/admin/settings", "mybot")
can_fetch_login = rp.can_fetch("http://example.com/admin/login", "mybot")
print(f"Can 'mybot' fetch /admin/settings? {can_fetch_admin}")
print(f"Can 'mybot' fetch /admin/login? {can_fetch_login}")
print(f"Crawl delay for 'mybot': {rp.crawl_delay('mybot')} seconds")
print(f"Sitemaps: {list(rp.sitemaps)}")