Pagefind Python API
Pagefind is a Python API wrapper for the Pagefind Rust binary, providing static, low-bandwidth full-text search capabilities for websites. It excels at indexing large static sites, supporting HTML files, custom records, and multilingual content. The library is actively maintained, with version 1.5.0 released on April 6, 2026, offering an asynchronous interface for programmatic indexing.
Warnings
- breaking Pagefind 1.0 (and subsequent versions) changed the default output directory from `_pagefind` to `pagefind`. Existing build scripts or configurations might need updating.
- breaking Pagefind 1.0 introduced CLI option renames: `source` was renamed to `site`, and `bundle-dir` was renamed to `output-subdir`. This primarily affects direct CLI usage but can impact Python scripts invoking the CLI.
- gotcha Installing `pagefind` via `pip install pagefind` only installs the Python wrapper. To get the necessary Pagefind Rust binary automatically, you must install with extras: `pip install 'pagefind[bin]'` for the standard binary, or `pip install 'pagefind[extended]'` for extended language support.
- gotcha The `PagefindIndex` object is an asynchronous context manager and must be used with `async with`. Failing to do so will prevent the index from being properly opened, written, and the backing service shut down.
- behavioral Pagefind v1.1.0 improved its core result ranking algorithm to align with BM25. This change will alter the ordering of search results compared to earlier versions, potentially providing better relevance by default.
Install
-
pip install 'pagefind[bin]' -
pip install 'pagefind[extended]' -
pip install pagefind
Imports
- PagefindIndex
from pagefind.index import PagefindIndex
- IndexConfig
from pagefind.index import IndexConfig
Quickstart
import asyncio
import json
import logging
import os
from pagefind.index import PagefindIndex, IndexConfig
logging.basicConfig(level=os.environ.get("LOG_LEVEL", "INFO"))
log = logging.getLogger(__name__)
html_content = (
"<html>"
" <body>"
" <main>"
" <h1>Example HTML</h1>"
" <p>This is an example HTML page.</p>"
" </main>"
" </body>"
"</html>"
)
async def main():
config = IndexConfig(
root_selector="main",
logfile="index.log",
output_path="./pagefind_output", # Using a custom output path for example
verbose=True
)
async with PagefindIndex(config=config) as index:
log.debug("Opened index")
new_file, new_record = await asyncio.gather(
index.add_html_file(
content=html_content,
url="https://example.com/some-page",
source_path="example.html",
),
index.add_custom_record(
url="/elephants/",
content="Some testing content regarding elephants",
language="en",
meta={"title": "Elephants"},
),
)
print(f"Indexed HTML file: {json.dumps(new_file, indent=2)}")
print(f"Added custom record: {json.dumps(new_record, indent=2)}")
print("Indexing complete. Output written to ./pagefind_output")
if __name__ == "__main__":
# To run this, ensure you have installed with `pip install 'pagefind[bin]'`
# or have the pagefind binary available in your PATH.
asyncio.run(main())