Pagefind Python API

1.5.0 · active · verified Sat Apr 11

Pagefind is a Python API wrapper for the Pagefind Rust binary, providing static, low-bandwidth full-text search capabilities for websites. It excels at indexing large static sites, supporting HTML files, custom records, and multilingual content. The library is actively maintained, with version 1.5.0 released on April 6, 2026, offering an asynchronous interface for programmatic indexing.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize Pagefind, add HTML content directly, and include custom records in the search index using its asynchronous Python API. The output index files will be saved to `./pagefind_output`.

import asyncio
import json
import logging
import os
from pagefind.index import PagefindIndex, IndexConfig

logging.basicConfig(level=os.environ.get("LOG_LEVEL", "INFO"))
log = logging.getLogger(__name__)

html_content = (
    "<html>"
    " <body>"
    " <main>"
    " <h1>Example HTML</h1>"
    " <p>This is an example HTML page.</p>"
    " </main>"
    " </body>"
    "</html>"
)

async def main():
    config = IndexConfig(
        root_selector="main",
        logfile="index.log",
        output_path="./pagefind_output", # Using a custom output path for example
        verbose=True
    )

    async with PagefindIndex(config=config) as index:
        log.debug("Opened index")
        new_file, new_record = await asyncio.gather(
            index.add_html_file(
                content=html_content,
                url="https://example.com/some-page",
                source_path="example.html",
            ),
            index.add_custom_record(
                url="/elephants/",
                content="Some testing content regarding elephants",
                language="en",
                meta={"title": "Elephants"},
            ),
        )

        print(f"Indexed HTML file: {json.dumps(new_file, indent=2)}")
        print(f"Added custom record: {json.dumps(new_record, indent=2)}")

    print("Indexing complete. Output written to ./pagefind_output")

if __name__ == "__main__":
    # To run this, ensure you have installed with `pip install 'pagefind[bin]'`
    # or have the pagefind binary available in your PATH.
    asyncio.run(main())

view raw JSON →