pylatexenc
pylatexenc is a simple Python library for parsing LaTeX code and converting between LaTeX and Unicode text. It's currently at version 2.10 and receives regular updates, mostly focusing on bug fixes and minor enhancements to its parsing and conversion capabilities.
Warnings
- breaking pylatexenc 2.0 introduced significant API changes from 1.x. Key changes include the deprecation of `keep_inline_math` in `LatexNodes2Text` in favor of `math_mode`, `macro_dict` in `LatexWalker` replaced by `latex_context`, and `pylatexenc.latexencode.utf8tolatex()` superseded by `unicode_to_latex()`.
- gotcha The `latexwalker` module, while powerful for parsing LaTeX structure, is not a full LaTeX engine. It focuses on syntactic parsing and node representation, not typesetting or full macro expansion, which can lead to unexpected results if treated as a complete renderer.
- deprecated Many internal classes and functions, especially within `pylatexenc.latexwalker` and node types, have been moved or renamed in preparation for `pylatexenc 3.0alpha`. For example, `len=` arguments were replaced by `pos_end=`, and node classes moved to the new `pylatexenc.latexnodes` module.
Install
-
pip install pylatexenc
Imports
- LatexNodes2Text
from pylatexenc.latex2text import LatexNodes2Text
- unicode_to_latex
from pylatexenc.latexencode import unicode_to_latex
- LatexWalker
from pylatexenc.latexwalker import LatexWalker
Quickstart
from pylatexenc.latex2text import LatexNodes2Text
from pylatexenc.latexencode import unicode_to_latex
# LaTeX to Unicode text conversion
latex_input = r"""
\textbf{Hi there!} Here is \emph{an equation}:
\begin{equation}
\zeta = x + i y
\end{equation}
where $i$ is the imaginary unit.
"""
converter = LatexNodes2Text()
unicode_output = converter.latex_to_text(latex_input)
print(f"LaTeX to Unicode:\n{unicode_output}\n")
# Expected: LaTeX to Unicode:
# Hi there! Here is an equation:
# ζ = x + i y
# where i is the imaginary unit.
# Unicode text to LaTeX conversion
text_input = "À votre santé! The length of samples #3 & #4 is 3μm"
latex_encoded_output = unicode_to_latex(text_input)
print(f"Unicode to LaTeX:\n{latex_encoded_output}")
# Expected: Unicode to LaTeX:
# \`A votre sant\'e! The length of samples \#3 \& \#4 is 3\ensuremath{\mu}m