Whoosh Reloaded

raw JSON →
2.7.5 verified Fri May 01 auth: no python

A fast, pure-Python full text indexing, search, and spell checking library. Fork of the original Whoosh with continued maintenance and compatibility fixes. Current version: 2.7.5. Released as needed.

pip install whoosh-reloaded
error ModuleNotFoundError: No module named 'whoosh'
cause Using import 'whoosh' when the installed version is old whoosh-reloaded (<2.7.4.4) or the package is not installed.
fix
Ensure you have whoosh-reloaded installed (pip install whoosh-reloaded) and import as 'from whoosh import ...'.
error whoosh.index.LockError
cause Trying to create or open an index that is already locked by another process.
fix
Close all other writers/searchers or delete the lock file (indexdir/.lock) if no other process is using it.
breaking In versions 2.7.4.0-2.7.4.4, the import path was 'whoosh_reloaded' instead of 'whoosh'. Upgrade to 2.7.5 to use 'whoosh' again.
fix Upgrade to whoosh-reloaded==2.7.5 or change imports from 'whoosh_reloaded' to 'whoosh'.
gotcha Whoosh is not thread-safe. The index object and searcher should not be shared across threads without locks.
fix Use threading.Lock around index/searcher access or instantiate per-thread.
gotcha The index directory must be locked for writing. If two processes try to write simultaneously, you'll get a LockError.
fix Use a single writer per index directory and avoid concurrent writes from multiple processes.

Create an index, add a document, and search.

import os
from whoosh import index
from whoosh.fields import Schema, TEXT
from whoosh.qparser import QueryParser

# Create schema
schema = Schema(title=TEXT(stored=True), content=TEXT)

# Create index directory
os.makedirs('indexdir', exist_ok=True)
ix = index.create_in('indexdir', schema)
writer = ix.writer()
writer.add_document(title=u'First document', content=u'This is the first document we add!')
writer.commit()

# Search
with ix.searcher() as searcher:
    query = QueryParser('content', ix.schema).parse('first')
    results = searcher.search(query)
    print(results[0]['title'])