Ultimate Sitemap Parser

1.8.0 · active · verified Thu Apr 16

Ultimate Sitemap Parser (USP) is a performant and robust Python library for parsing and crawling sitemaps. It supports all major sitemap formats (XML, Google News, plain text, RSS/Atom), handles nested sitemaps, is error-tolerant, and efficiently processes large hierarchies using a lazy-loading generator for pages. The library is actively maintained, with frequent releases; the current version is 1.8.0.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart fetches and parses the sitemap for a given homepage URL, automatically discovering and traversing any linked sitemap index files. It then iterates through all unique pages found across the entire sitemap hierarchy, printing each page's URL. The `all_pages()` method uses a generator for memory-efficient processing of large sitemaps.

from usp.tree import sitemap_tree_for_homepage

# Replace with the target website URL for sitemap discovery
target_url = "https://www.example.org/"

try:
    # Fetches sitemaps, discovers nested sitemaps, and builds a tree structure
    tree = sitemap_tree_for_homepage(target_url)

    print(f"Successfully parsed sitemap for: {target_url}")
    print("Listing all discovered pages:")

    # Iterate through all pages found across the sitemap hierarchy
    # Uses a generator for memory efficiency with large sitemaps
    page_count = 0
    for page in tree.all_pages():
        print(page.url)
        page_count += 1
    print(f"Found {page_count} pages.")

except Exception as e:
    print(f"An error occurred: {e}")

view raw JSON →