Whoosh

2.7.4 · active · verified Sun Apr 12

Whoosh is a fast, pure-Python library for full-text indexing, searching, and spell checking. It allows developers to add search functionality to applications and websites without external compilers or binary dependencies. The library is highly customizable and currently stable at version 2.7.4, maintained by the whoosh-community.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define a schema, create or open an index, add documents to the index, and then perform a basic text search. It includes handling the creation of the index directory if it doesn't exist. Documents are added with `title`, `path`, and `content` fields, and a `QueryParser` is used to search the 'content' field.

import os
from whoosh.index import create_in, open_dir
from whoosh.fields import Schema, TEXT, ID
from whoosh.qparser import QueryParser

# 1. Define schema
schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)

# 2. Create or open index directory
indexdir = "indexdir"
if not os.path.exists(indexdir):
    os.mkdir(indexdir)
    ix = create_in(indexdir, schema)
else:
    ix = open_dir(indexdir)

# 3. Add documents
writer = ix.writer()
writer.add_document(title=u"First document", path=u"/a",
                    content=u"This is the first document we've added!")
writer.add_document(title=u"Second document", path=u"/b",
                    content=u"The second one is even more interesting!")
writer.commit()

# 4. Search documents
with ix.searcher() as searcher:
    query_parser = QueryParser("content", ix.schema)
    query = query_parser.parse("first")
    results = searcher.search(query)
    for hit in results:
        print(f"Found: {hit['title']} at {hit['path']}")

# Clean up (optional: remove the index directory)
# import shutil
# shutil.rmtree(indexdir)

view raw JSON →