{"id":4076,"library":"latexcodec","title":"LaTeX Codec","description":"latexcodec is a Python library providing a lexer and codec for converting text between LaTeX markup and Unicode. It is particularly suited for handling short segments of LaTeX code, such as paragraphs or entries in a BibTeX file, rather than compiling full LaTeX documents. The current stable version is 3.0.1, and it maintains an active but measured release cadence.","status":"active","version":"3.0.1","language":"en","source_language":"en","source_url":"https://github.com/mcmtroffaes/latexcodec","tags":["latex","codec","encoding","unicode","text-processing"],"install":[{"cmd":"pip install latexcodec","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Required Python version","package":"python","version":">=3.9","optional":false}],"imports":[{"note":"Importing `latexcodec` automatically registers the 'latex' and 'ulatex' codecs with Python's standard `codecs` module. Direct functions are not typically exposed for end-user encoding/decoding.","wrong":"from latexcodec import decode_latex_str","symbol":"latexcodec","correct":"import latexcodec\nimport codecs\n# Use codecs.decode() and codecs.encode()"}],"quickstart":{"code":"import codecs\nimport latexcodec # This registers the 'latex' and 'ulatex' codecs\n\n# Decode LaTeX to Unicode\nlatex_text = r\"I like b\\\"all{\\oe}ns and M\\\"uller.\"\nunicode_output = codecs.decode(latex_text, \"ulatex\")\nprint(f\"Decoded LaTeX: {unicode_output}\")\n\n# Encode Unicode to LaTeX\nunicode_input = \"élève\"\nlatex_output = codecs.encode(unicode_input, \"ulatex\")\nprint(f\"Encoded Unicode: {latex_output}\")\n\n# Example with specific encoding (e.g., Latin-1)\nlatin1_latex_bytes = b\"\\xfe\" # Represents 'þ' in Latin-1\ndecoded_latin1 = latin1_latex_bytes.decode(\"latex+latin1\")\nprint(f\"Decoded Latin1 LaTeX: {decoded_latin1}\")\n\n# Example with error handling during encoding for unrepresentable characters\nunicode_with_unrepresentable = \"A keyboard: ⌨\"\n# Using 'keep' error handler with 'ulatex' codec\nencoded_kept = codecs.encode(unicode_with_unrepresentable, \"ulatex\", \"keep\")\nprint(f\"Encoded with 'keep' error (ulatex): {encoded_kept}\")\n\n# Using 'ulatex+utf8' for robust encoding of all Unicode characters\nencoded_utf8 = codecs.encode(unicode_with_unrepresentable, \"ulatex+utf8\")\nprint(f\"Encoded with ulatex+utf8: {encoded_utf8}\")","lang":"python","description":"This quickstart demonstrates how to decode LaTeX strings to Unicode and encode Unicode strings to LaTeX using the `latex` and `ulatex` codecs registered by importing `latexcodec`. It also shows how to specify additional encodings and handle unrepresentable characters during encoding."},"warnings":[{"fix":"Upgrade to `latexcodec>=3.0.1`.","message":"Versions prior to 3.0.1 are incompatible with Python 3.13+ due to the removal of `pkg_resources.open_text`. Users on Python 3.13 and newer must upgrade to `latexcodec` 3.0.1 or later.","severity":"breaking","affected_versions":"<3.0.1"},{"fix":"Evaluate migrating to `pylatexenc` for new projects or existing ones that require more robust LaTeX handling. (https://github.com/phfaist/pylatexenc)","message":"The maintainer strongly encourages users to consider `pylatexenc` as a superior alternative to `latexcodec` for LaTeX code processing.","severity":"deprecated","affected_versions":"All versions"},{"fix":"Ensure use cases align with the library's scope. For full document parsing or compilation, consider dedicated LaTeX parsers or compilers.","message":"This library is primarily designed for processing short fragments of LaTeX text (e.g., paragraphs, BibTeX entries) and is not intended to function as a full LaTeX compiler or for comprehensive document processing.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware of this behavior and implement post-processing if fully flattened Unicode is required. Unrecognized commands may require custom translation tables.","message":"When decoding LaTeX, commands that do not directly represent characters (e.g., macros, formatting commands like `\\textbf`) or are unrecognized by the codec are passed through unchanged. This can result in a 'hybrid' Unicode string containing unexpanded LaTeX commands.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `codecs.encode(unicode_str, 'ulatex+utf8')` or `codecs.encode(unicode_str, 'ulatex', 'keep')`.","message":"Encoding Unicode characters to LaTeX can fail if the characters cannot be represented by the default (ASCII) LaTeX encoding. For more robust encoding, use the `ulatex+utf8` codec or specify the `'keep'` error handler with `ulatex` to retain unencodable characters.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Account for these canonicalizations when comparing decoded output to original source or when expecting specific formatting.","message":"The decoding process canonicalizes certain LaTeX elements: comments are dropped, paragraphs are converted to double newlines, and spacing after LaTeX commands is standardized. This can lead to subtle differences in the decoded text's structure compared to the original LaTeX source.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}