LaTeX Codec

3.0.1 · active · verified Sat Apr 11

latexcodec is a Python library providing a lexer and codec for converting text between LaTeX markup and Unicode. It is particularly suited for handling short segments of LaTeX code, such as paragraphs or entries in a BibTeX file, rather than compiling full LaTeX documents. The current stable version is 3.0.1, and it maintains an active but measured release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to decode LaTeX strings to Unicode and encode Unicode strings to LaTeX using the `latex` and `ulatex` codecs registered by importing `latexcodec`. It also shows how to specify additional encodings and handle unrepresentable characters during encoding.

import codecs
import latexcodec # This registers the 'latex' and 'ulatex' codecs

# Decode LaTeX to Unicode
latex_text = r"I like b\"all{\oe}ns and M\"uller."
unicode_output = codecs.decode(latex_text, "ulatex")
print(f"Decoded LaTeX: {unicode_output}")

# Encode Unicode to LaTeX
unicode_input = "élève"
latex_output = codecs.encode(unicode_input, "ulatex")
print(f"Encoded Unicode: {latex_output}")

# Example with specific encoding (e.g., Latin-1)
latin1_latex_bytes = b"\xfe" # Represents 'þ' in Latin-1
decoded_latin1 = latin1_latex_bytes.decode("latex+latin1")
print(f"Decoded Latin1 LaTeX: {decoded_latin1}")

# Example with error handling during encoding for unrepresentable characters
unicode_with_unrepresentable = "A keyboard: ⌨"
# Using 'keep' error handler with 'ulatex' codec
encoded_kept = codecs.encode(unicode_with_unrepresentable, "ulatex", "keep")
print(f"Encoded with 'keep' error (ulatex): {encoded_kept}")

# Using 'ulatex+utf8' for robust encoding of all Unicode characters
encoded_utf8 = codecs.encode(unicode_with_unrepresentable, "ulatex+utf8")
print(f"Encoded with ulatex+utf8: {encoded_utf8}")

view raw JSON →