{"id":4406,"library":"whoosh","title":"Whoosh","description":"Whoosh is a fast, pure-Python library for full-text indexing, searching, and spell checking. It allows developers to add search functionality to applications and websites without external compilers or binary dependencies. The library is highly customizable and currently stable at version 2.7.4, maintained by the whoosh-community.","status":"active","version":"2.7.4","language":"en","source_language":"en","source_url":"https://github.com/whoosh-community/whoosh","tags":["search engine","full-text search","indexing","pure-python","information retrieval"],"install":[{"cmd":"pip install whoosh","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"symbol":"create_in","correct":"from whoosh.index import create_in"},{"symbol":"Schema","correct":"from whoosh.fields import Schema, TEXT, ID, STORED"},{"symbol":"QueryParser","correct":"from whoosh.qparser import QueryParser"},{"note":"While 'import whoosh.index' works, direct import from whoosh.index is more common for specific functions like create_in or open_dir, and 'from whoosh import index' is used to access general index-related functions and objects.","wrong":"import whoosh.index","symbol":"index","correct":"from whoosh import index"}],"quickstart":{"code":"import os\nfrom whoosh.index import create_in, open_dir\nfrom whoosh.fields import Schema, TEXT, ID\nfrom whoosh.qparser import QueryParser\n\n# 1. Define schema\nschema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)\n\n# 2. Create or open index directory\nindexdir = \"indexdir\"\nif not os.path.exists(indexdir):\n    os.mkdir(indexdir)\n    ix = create_in(indexdir, schema)\nelse:\n    ix = open_dir(indexdir)\n\n# 3. Add documents\nwriter = ix.writer()\nwriter.add_document(title=u\"First document\", path=u\"/a\",\n                    content=u\"This is the first document we've added!\")\nwriter.add_document(title=u\"Second document\", path=u\"/b\",\n                    content=u\"The second one is even more interesting!\")\nwriter.commit()\n\n# 4. Search documents\nwith ix.searcher() as searcher:\n    query_parser = QueryParser(\"content\", ix.schema)\n    query = query_parser.parse(\"first\")\n    results = searcher.search(query)\n    for hit in results:\n        print(f\"Found: {hit['title']} at {hit['path']}\")\n\n# Clean up (optional: remove the index directory)\n# import shutil\n# shutil.rmtree(indexdir)\n","lang":"python","description":"This quickstart demonstrates how to define a schema, create or open an index, add documents to the index, and then perform a basic text search. It includes handling the creation of the index directory if it doesn't exist. Documents are added with `title`, `path`, and `content` fields, and a `QueryParser` is used to search the 'content' field."},"warnings":[{"fix":"Prefix string literals with 'u' in Python 2 (e.g., `u'your string'`). In Python 3, all strings are Unicode by default, so `\"your string\"` is sufficient.","message":"When adding documents, ensure text fields are passed as Unicode strings (e.g., `u\"my text\"` in Python 2 or regular strings in Python 3). Non-text fields that are stored but not indexed (STORED type) can be any pickle-able object.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Define a unique field in your Schema (e.g., `path=ID(unique=True)`), then use `writer.update_document()` instead of `writer.add_document()` when you intend to replace or update an existing document. If no match is found for the unique field, `update_document` acts like `add_document`.","message":"Whoosh does not inherently enforce uniqueness for documents. Calling `add_document` multiple times with identical data will result in multiple duplicate documents in the index. Use `update_document` with a `unique=True` field in your schema to overwrite existing documents.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always ensure the directory for your index exists by creating it with `os.makedirs(indexdir, exist_ok=True)` or `os.mkdir(indexdir)` before calling `create_in()`.","message":"The `whoosh.index.create_in()` function requires the directory to exist before it's called. If the directory does not exist, a `FileNotFoundError` will occur.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Stick to high-level functions like `whoosh.index.create_in()` and `whoosh.index.open_dir()` for managing your index to ensure compatibility and stability.","message":"Direct manipulation of index files or relying on undocumented internal structures can lead to issues with future updates. Always use the public API for index management. Some older examples might show direct `FileStorage` usage without `index.create_in` or `index.open_dir` convenience functions.","severity":"deprecated","affected_versions":"<2.x (informal deprecation, more of a best practice)"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}