{"id":9383,"library":"ultimate-sitemap-parser","title":"Ultimate Sitemap Parser","description":"Ultimate Sitemap Parser (USP) is a performant and robust Python library for parsing and crawling sitemaps. It supports all major sitemap formats (XML, Google News, plain text, RSS/Atom), handles nested sitemaps, is error-tolerant, and efficiently processes large hierarchies using a lazy-loading generator for pages. The library is actively maintained, with frequent releases; the current version is 1.8.0.","status":"active","version":"1.8.0","language":"en","source_language":"en","source_url":"https://github.com/GateNLP/ultimate-sitemap-parser","tags":["sitemap","parser","web crawling","SEO","XML","RSS","Atom","robots.txt"],"install":[{"cmd":"pip install ultimate-sitemap-parser","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Used for date parsing and manipulation within sitemap entries.","package":"python-dateutil","optional":false},{"reason":"Used as the default HTTP client for fetching sitemap content from URLs.","package":"requests","optional":false}],"imports":[{"note":"The top-level package for import is `usp`, not `ultimate_sitemap_parser`.","wrong":"from ultimate_sitemap_parser.tree import sitemap_tree_for_homepage","symbol":"sitemap_tree_for_homepage","correct":"from usp.tree import sitemap_tree_for_homepage"},{"note":"For parsing sitemaps from local string content.","symbol":"sitemap_from_str","correct":"from usp.tree import sitemap_from_str"}],"quickstart":{"code":"from usp.tree import sitemap_tree_for_homepage\n\n# Replace with the target website URL for sitemap discovery\ntarget_url = \"https://www.example.org/\"\n\ntry:\n    # Fetches sitemaps, discovers nested sitemaps, and builds a tree structure\n    tree = sitemap_tree_for_homepage(target_url)\n\n    print(f\"Successfully parsed sitemap for: {target_url}\")\n    print(\"Listing all discovered pages:\")\n\n    # Iterate through all pages found across the sitemap hierarchy\n    # Uses a generator for memory efficiency with large sitemaps\n    page_count = 0\n    for page in tree.all_pages():\n        print(page.url)\n        page_count += 1\n    print(f\"Found {page_count} pages.\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")","lang":"python","description":"This quickstart fetches and parses the sitemap for a given homepage URL, automatically discovering and traversing any linked sitemap index files. It then iterates through all unique pages found across the entire sitemap hierarchy, printing each page's URL. The `all_pages()` method uses a generator for memory-efficient processing of large sitemaps."},"warnings":[{"fix":"Ensure your environment uses Python 3.10 or newer before installing or upgrading `ultimate-sitemap-parser`.","message":"Python 3.8 is no longer supported as of version 1.3.0. The minimum required Python version for recent releases (including 1.8.0) is >=3.10.","severity":"breaking","affected_versions":">=1.3.0"},{"fix":"Update your custom `AbstractWebClient` implementation to include the `url()` method: `def url(self) -> str: ...`.","message":"If you use custom web clients by subclassing `AbstractWebClient`, you must implement the new `url()` method as of version 1.3.0. This method should return the actual URL fetched after any redirects.","severity":"breaking","affected_versions":">=1.3.0"},{"fix":"Always validate sitemap files for correct XML syntax, proper `http://www.sitemaps.org/schemas/sitemap/0.9` namespace, and valid UTF-8 encoding. The library will return an `InvalidSitemap` object if it cannot be parsed.","message":"Malformed sitemap XML (e.g., incorrect namespace, missing tags, invalid URLs, improper encoding) can lead to `InvalidSitemap` objects or parsing failures, even though the library is error-tolerant.","severity":"gotcha","affected_versions":"All versions"},{"fix":"The `all_pages()` method uses a generator to lazily load pages, which helps with memory. For optimal performance with massive sites, ensure ample system resources or consider processing sitemap subsets if possible.","message":"While designed for efficiency, processing extremely large sitemaps (e.g., >50MB uncompressed or >50,000 URLs) can still be resource-intensive or hit memory limits.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Change your import statements from `import ultimate_sitemap_parser` or `from ultimate_sitemap_parser.tree import ...` to `import usp` or `from usp.tree import ...` respectively.","cause":"The top-level package for importing is `usp`, not `ultimate_sitemap_parser`.","error":"ModuleNotFoundError: No module named 'ultimate_sitemap_parser'"},{"fix":"Check the URL for correctness and accessibility. The `InvalidSitemap` object itself might contain information about the failure (e.g., HTTP status code or parsing error). Inspect the logs for details during the parsing process.","cause":"The `sitemap_tree_for_homepage` function or other parsing methods returned an `InvalidSitemap` object, indicating that the sitemap could not be fetched or parsed successfully.","error":"AttributeError: 'InvalidSitemap' object has no attribute 'all_pages'"},{"fix":"This usually indicates a malformed sitemap structure on the target website. Review the sitemap files for unintended circular dependencies. The library prevents infinite loops, but this exception signals a problematic sitemap design.","cause":"The library detected a circular reference within the sitemap hierarchy (a sitemap linking back to itself or an ancestor) or an excessively deep, potentially infinite, recursion.","error":"SitemapException: Maximum recursion depth exceeded (URL: ...)"}]}