quantulum3
quantulum3 is an actively maintained Python library (v0.10.0) designed for the robust extraction of quantities, measurements, and their associated units from unstructured text. It leverages machine learning, specifically k-nearest neighbors on GloVe vector representations, to disambiguate between similar-looking units, ensuring accurate information retrieval. The project is a Python 3 compatible fork of earlier versions and continues to evolve with ongoing development and community contributions.
Common errors
-
ModuleNotFoundError: No module named 'quantulum'
cause Attempting to import the Python 2 version 'quantulum' in a Python 3 environment, or after installing 'quantulum3' but trying to import the old name.fixInstall `quantulum3` using `pip install quantulum3` and import with `from quantulum3 import parser`. -
[] (empty list returned) when parsing a single unit word.
cause The `parser.parse()` function is designed to extract quantities (number + unit). It typically requires a numerical value to identify a quantity, even if spelled out.fixIf you need to get unit information for a standalone word, consider prepending a number, e.g., `parser.parse('1 meter')`, then extract the unit from the resulting Quantity object. -
FileNotFoundError: [Errno 2] No such file or directory: '.../quantulum3/common-4-letter-words.txt'
cause This error, though largely fixed in newer versions, indicates that an internal data file expected by quantulum3 could not be located. This might stem from an incomplete installation or environment issues.fixUpgrade to the latest `quantulum3` version (`pip install --upgrade quantulum3`). If the problem persists, try reinstalling in a clean virtual environment.
Warnings
- breaking Users migrating from the older 'quantulum' library (Python 2) to Python 3 must install 'quantulum3'. Attempting to import 'quantulum' in a Python 3 environment will result in a ModuleNotFoundError or unexpected behavior.
- gotcha The unit disambiguation classifier, which improves accuracy for ambiguous units, requires additional dependencies (e.g., scikit-learn). These are not installed by default with `pip install quantulum3`.
- gotcha The parser often fails to extract standalone unit names without a preceding numerical value, returning an empty list. E.g., `parser.parse('meter')` will yield no results.
- gotcha Older versions of quantulum3 (prior to a fix in 2018) occasionally had FileNotFoundError issues related to internal data files like 'common-4-letter-words.txt', particularly in non-standard installation environments.
Install
-
pip install quantulum3 -
pip install quantulum3[classifier]
Imports
- parser
from quantulum import parser
from quantulum3 import parser
- Quantity
from quantulum3.classes import Quantity
Quickstart
from quantulum3 import parser
text = 'I want 2 liters of wine and 10 million dollars.'
quants = parser.parse(text)
for q in quants:
print(f"Value: {q.value}, Unit: {q.unit.name}, Surface: '{q.surface}', Span: {q.span}")
# Example of inline parsing
inline_text = parser.inline_parse(text)
print(f"\nInline parsed text: {inline_text}")