Natural Language Toolkit (NLTK)

3.9.4 · active · verified Sat Mar 28

NLTK (Natural Language Toolkit) is a leading open-source Python library for Natural Language Processing (NLP). It provides easy-to-use interfaces to over 50 corpora and lexical resources, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. Currently at version 3.9.4, NLTK generally follows a release cadence of a few minor versions per year, with more significant updates addressing security and Python compatibility as needed.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic text tokenization and Part-of-Speech (POS) tagging using NLTK. It includes checks to download the 'punkt' tokenizer and 'averaged_perceptron_tagger' if they are not already present, which are common requirements for many NLTK operations. This ensures the example is runnable out-of-the-box.

import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

# Download necessary NLTK data (run once)
try:
    nltk.data.find('tokenizers/punkt')
except nltk.downloader.DownloadError:
    nltk.download('punkt')
try:
    nltk.data.find('taggers/averaged_perceptron_tagger')
except nltk.downloader.DownloadError:
    nltk.download('averaged_perceptron_tagger')

text = "NLTK is a powerful library for natural language processing."

# Tokenization
tokens = word_tokenize(text)
print(f"Tokens: {tokens}")

# Part-of-Speech Tagging
tagged_tokens = pos_tag(tokens)
print(f"POS Tagged: {tagged_tokens}")

view raw JSON →