Luqum: Lucene Query Parser
Luqum (LUcene QUery Manipolator) is a Python library that parses Lucene Query DSL strings, building an abstract syntax tree (AST) for inspection, analysis, and manipulation. It enables transforming Lucene DSL queries into native Elasticsearch JSON DSL. The library is currently at version 1.0.0 and sees releases as new features and maintenance updates are introduced, typically every few months. [1, 7, 11]
Warnings
- gotcha Lucene queries with implicit operators (e.g., 'foo bar' instead of 'foo AND bar') are parsed as `UnknownOperation`. Users need to apply a transformer like `UnknownOperationResolver` to explicitly define the operator (e.g., AND, OR) for correct interpretation.
- gotcha When constructing or modifying ASTs programmatically (rather than parsing a string), the `head` and `tail` properties (representing non-meaningful text like spaces around elements) must be set manually if preserving the original query's formatting or position information is critical. These properties are computed automatically during parsing.
- gotcha The underlying PLY library, used by luqum for parsing, is not inherently thread-safe. For concurrent parsing operations in a multi-threaded environment, `luqum.thread.parse()` should be used instead of `luqum.parser.parser.parse()` to ensure thread-safe execution by cloning the lexer state.
- breaking Version 0.14.0 removed official support for Python 3.6, 3.7, 3.8, and 3.9. Users on these older Python versions should use `luqum < 0.14.0` or upgrade their Python environment.
- breaking In version 0.11.0, the `naming` module and its `auto_name` function were completely modified, leading to API incompatibility for any code using these features.
- breaking Prior to version 0.7.0, the `ElasticsearchQueryBuilder` transformed single-word matches into `match_phrase` queries. From 0.7.0 onwards, if the field is analyzed, it now uses a `match` query, which aligns more closely with Elasticsearch's `query_string` behavior. This might alter the resulting Elasticsearch query structure for some inputs.
Install
-
pip install luqum
Imports
- parser
from luqum.parser import parser
- ElasticsearchQueryBuilder
from luqum.elasticsearch import ElasticsearchQueryBuilder
- UnknownOperationResolver
from luqum.utils import UnknownOperationResolver
- parse
from luqum.thread import parse as thread_safe_parse
- AndOperation, Term, SearchField, Word
from luqum.tree import AndOperation, Term, SearchField, Word
Quickstart
from luqum.parser import parser
from luqum.elasticsearch import ElasticsearchQueryBuilder
from luqum.utils import UnknownOperationResolver
# 1. Parse a Lucene query string
query_string = '(title:"foo bar" AND body:"quick fox") OR title:fox'
tree = parser.parse(query_string)
print(f"Parsed AST: {repr(tree)}")
print(f"String representation: {str(tree)}\n")
# 2. Resolve unknown operations (e.g., implicit AND/OR)
# For a query like 'foo bar', it's parsed as UnknownOperation(Word('foo'), Word('bar'))
# Use a resolver to make it explicit, e.g., 'foo AND bar'
resolver = UnknownOperationResolver(default_operation=AndOperation) # AndOperation needs to be imported from luqum.tree
resolved_tree = resolver(parser.parse('foo bar'))
print(f"Resolved 'foo bar' to: {str(resolved_tree)}\n")
# 3. Transform to Elasticsearch Query DSL
# For complex schemas, pass nested_fields and object_fields arguments
es_builder = ElasticsearchQueryBuilder()
es_query = es_builder(tree)
print(f"Elasticsearch DSL:\n{es_query}")