PyObjC NaturalLanguage Framework
PyObjC is a bridge between Python and Objective-C, enabling Python scripts to leverage Apple's Cocoa frameworks. This particular library, `pyobjc-framework-naturallanguage` (version 12.1), provides Python wrappers for the macOS NaturalLanguage framework, allowing applications to access native natural language processing capabilities on macOS. PyObjC releases generally align with macOS SDK updates.
Warnings
- breaking PyObjC 12.0 dropped support for Python 3.9, and PyObjC 11.0 previously dropped support for Python 3.8. Users upgrading must ensure their environment uses Python 3.10 or newer.
- gotcha This library wraps macOS-specific frameworks and will only function on macOS. It is not compatible with other operating systems.
- breaking PyObjC 10.3 introduced breaking changes to how `__init__` is handled in Python classes with custom `__new__` implementations. While 10.3.1 partially reverted this to allow `__init__` when a *user* implements `__new__`, classes relying on PyObjC's provided `__new__` still cannot use `__init__`.
- breaking PyObjC 11.1 updated its Automatic Reference Counting (ARC) model for Objective-C initializer methods (those prefixed with 'init'). These methods now correctly model the behavior of 'stealing a reference to self and returning a new reference', which may affect memory management and require review of manual `alloc()`/`init_()` patterns.
- gotcha Apple's `NLTokenizer` documentation specifies that instances should be used on one thread or dispatch queue at a time. Failing to adhere to this can lead to unpredictable behavior or crashes in multi-threaded Python applications using this wrapper.
Install
-
pip install pyobjc-framework-naturallanguage
Imports
- NaturalLanguage
import NaturalLanguage
- NLTokenizer
from NaturalLanguage import NLTokenizer
- NLLanguageRecognizer
from NaturalLanguage import NLLanguageRecognizer
- NLTokenUnit
from NaturalLanguage import NLTokenUnit
Quickstart
import NaturalLanguage
from Foundation import NSMakeRange
text_to_analyze = "The quick brown fox jumps over the lazy dog. This is a second sentence."
# --- Tokenization Example ---
# Create an NLTokenizer instance for word units
tokenizer = NaturalLanguage.NLTokenizer.alloc().initWithUnit_(NaturalLanguage.NLTokenUnitWord)
# Set the string to be tokenized
tokenizer.setString_(text_to_analyze)
tokens = []
# Enumerate tokens using a Python callable as the Objective-C block
def token_block_handler(token_range, flags):
start = token_range.location
length = token_range.length
token_text = text_to_analyze[start : start + length]
tokens.append(token_text)
return True # Return True to continue enumeration
tokenizer.enumerateTokensInRange_usingBlock_(
NSMakeRange(0, len(text_to_analyze)),
token_block_handler
)
print(f"Original text: '{text_to_analyze}'")
print(f"Tokens (words): {tokens}")
# --- Language Recognition Example ---
lang_recognizer = NaturalLanguage.NLLanguageRecognizer.alloc().init()
lang_recognizer.processString_(text_to_analyze)
dominant_language = lang_recognizer.dominantLanguage()
print(f"Dominant language: {dominant_language}")