{"id":2800,"library":"tantivy","title":"Tantivy Python Bindings","description":"Tantivy-py provides official Python bindings for Tantivy, a high-performance full-text search engine library written in Rust and inspired by Apache Lucene. It offers fast indexing and search capabilities. The current version is 0.25.1, and the project maintains an active development cycle with relatively frequent releases of minor versions, often a few months apart.","status":"active","version":"0.25.1","language":"en","source_language":"en","source_url":"https://github.com/quickwit-oss/tantivy-py","tags":["search","full-text","rust-bindings","lucene-inspired","information-retrieval"],"install":[{"cmd":"pip install tantivy","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Required for building from source if a pre-compiled wheel is not available for your system. Tantivy-py uses PyO3 bindings which rely on Rust.","package":"Rust","optional":false}],"imports":[{"symbol":"tantivy","correct":"import tantivy"},{"note":"Most core classes are directly under the 'tantivy' top-level module.","wrong":"from tantivy.schema import SchemaBuilder","symbol":"SchemaBuilder","correct":"from tantivy import SchemaBuilder"}],"quickstart":{"code":"import tantivy\nimport os\n\n# 1. Declare the schema\nschema_builder = tantivy.SchemaBuilder()\nschema_builder.add_text_field(\"title\", stored=True, tokenizer_name=\"default\")\nschema_builder.add_text_field(\"body\", stored=True, tokenizer_name=\"default\")\nschema_builder.add_integer_field(\"doc_id\", stored=True, indexed=True)\nschema = schema_builder.build()\n\n# 2. Create an in-memory index (for persistent, specify a path)\n# To use a persistent index, use: index = tantivy.Index(schema, path=\"/tmp/my_index\")\nindex = tantivy.Index(schema)\n\n# 3. Get an index writer and add documents\nwriter = index.writer(50_000_000) # 50MB memory arena\nwriter.add_document(tantivy.Document(title=[\"The Old Man and the Sea\"], body=[\"He was an old man who fished alone in a skiff.\"], doc_id=[1]))\nwriter.add_document(tantivy.Document(title=[\"The Great Gatsby\"], body=[\"In my younger and more vulnerable years my father gave me some advice.\"], doc_id=[2]))\nwriter.commit()\n\n# 4. Get a reader and searcher\nindex.reload()\nreader = index.reader()\nsearcher = reader.searcher()\n\n# 5. Build and execute a query\nquery_parser = tantivy.QueryParser(schema, default_fields=[\"title\", \"body\"])\nquery = query_parser.parse_query(\"old man\")\nhits = searcher.search(query, 10)\n\n# 6. Retrieve documents\nprint(\"Search results:\")\nfor score, doc_address in hits:\n    retrieved_doc = searcher.doc(doc_address)\n    print(f\"  Score: {score:.2f}, Doc ID: {retrieved_doc['doc_id'][0]}, Title: {retrieved_doc['title'][0]}\")\n\n# Example of retrieving a non-existent field (will be empty list)\nmissing_field = retrieved_doc.get('non_existent_field')\nprint(f\"  Non-existent field for last doc: {missing_field}\") # Expected: []\n","lang":"python","description":"This quickstart demonstrates how to define a schema, create an in-memory Tantivy index, add documents to it, and perform a basic search. It also shows how to retrieve the full document content from search hits."},"warnings":[{"fix":"Install Rust via `rustup` before attempting `pip install tantivy`.","message":"To install `tantivy` from source (if no pre-compiled wheel is available for your system), you must have Rust installed and configured. This is a common requirement for Python libraries with Rust bindings.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review the Tantivy Rust documentation (or tantivy-py changelog) for alternatives or re-architect solutions that previously relied on index sorting.","message":"Version 0.25.0 introduced a breaking API change by removing index sorting. Users relying on this feature will need to adjust their indexing and search strategies.","severity":"breaking","affected_versions":">=0.25.0"},{"fix":"Implement a delete-and-reindex strategy for document updates.","message":"Tantivy treats document data as immutable. To 'edit' a document, you must delete the existing document (by its `DocAddress` or a specific term query) and then reindex the updated version.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Manage access to the `IndexWriter` to ensure singularity, typically by having a single process or thread manage all write operations to an index, or using a locking mechanism in multi-process/thread scenarios.","message":"Only one `IndexWriter` can be active at a time for a given index. While the `IndexWriter` itself is multithreaded, concurrent attempts to create multiple writers will fail.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always use `searcher.doc(doc_address)` to fetch the stored fields of a document after a search.","message":"Search operations return a list of `(score, DocAddress)` tuples. To retrieve the actual document content, you must use the `DocAddress` with a `Searcher`'s `doc()` method, rather than receiving the document directly in search results.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your schema defines the unique ID field with `tantivy.SchemaBuilder.add_integer_field(\"your_id_field\", stored=True, indexed=True, fast=True)`.","message":"For incremental indexing and efficient document deletion, the field used to identify documents for deletion (e.g., a unique ID) must be an integer field, set to `indexed=True` and `fast=True` in the schema.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}