ftfy (Fixes Text For You)
ftfy (Fixes Text For You) is a Python library designed to clean up Unicode text that has gone wrong, often due to mojibake, encoding mismatches, or badly-decoded HTML. It intelligently detects and corrects common text issues after the fact, making your data more readable and usable. The current version is 6.3.1, and releases are somewhat irregular but actively maintained.
Warnings
- breaking ftfy 6.0.x introduced a completely rewritten heuristic for detecting and fixing mojibake. While generally an improvement, it might behave differently for certain edge cases compared to 5.x. Users migrating from 5.x should review output carefully.
- deprecated The function `ftfy.fixes.fix_encoding_and_explain` was deprecated and subsequently removed in ftfy 6.0.x. Its functionality, along with broader fixing capabilities, is now integrated into the top-level `ftfy.fix_and_explain`.
- breaking ftfy 6.x requires Python 3.9 or newer. Older versions (e.g., 5.x) supported older Python versions (e.g., Python 2.7, 3.4+).
- gotcha ftfy is designed to *fix* corrupted text, not to perform general text normalization, re-encode correctly decoded text, or handle natural language processing tasks. Applying it to already clean or correctly encoded text may not yield desired results or could introduce unintended changes, although it tries to be conservative.
Install
-
pip install ftfy
Imports
- fix_text
from ftfy import fix_text
- fix_and_explain
from ftfy import fix_and_explain
Quickstart
import ftfy
# Basic text fixing
text = "The quick brown fox jumped over the lazy dogs.—David"
fixed_text = ftfy.fix_text(text)
print(f"Original: {text}")
print(f"Fixed: {fixed_text}")
# Expected: The quick brown fox jumped over the lazy dogs.—David
# Fixing with explanation (useful for debugging)
text_with_mojibake = "It was the best of times, it was the worst of times.â\x80\x94Charles Dickens"
fixed, explanation = ftfy.fix_and_explain(text_with_mojibake)
print(f"\nOriginal: {text_with_mojibake}")
print(f"Fixed: {fixed}")
print(f"Explanation: {explanation}")
# Expected fixed: It was the best of times, it was the worst of times.—Charles Dickens