Tree-sitter Language Pack
tree-sitter-language-pack is a Python library providing pre-compiled Tree-sitter parsers for over 300 programming languages. It offers a unified `process()` API for efficient parsing, advanced code analysis, and intelligent code chunking. The library is actively maintained with frequent releases, typically introducing new language grammars and API enhancements.
Warnings
- gotcha Prior to v1.4.2, installation documentation, especially for the CLI (`ts-pack-cli`), contained incorrect package names and references, leading to potential installation issues or confusion.
- gotcha Versions prior to v1.3.3 experienced issues with dynamic parser loading and `c_symbol` overrides for specific languages (e.g., C#, VB, Embedded Template, Nushell), potentially causing runtime errors.
- gotcha While direct access to `tree-sitter.Language` and `tree-sitter.Parser` objects via `get_language()` and `get_parser()` is available, the `process()` function with `ProcessConfig` is the recommended and unified API across all bindings for comprehensive code intelligence, including parsing, analysis, and chunking.
- gotcha tree-sitter-language-pack aims to simplify Tree-sitter usage by bundling grammars. However, if integrating with other raw `tree-sitter` libraries or manually compiled grammars, be aware that the underlying `tree-sitter` ecosystem can have complex versioning and compilation requirements that might lead to conflicts or unexpected behavior.
Install
-
pip install tree-sitter-language-pack -
uv add tree-sitter-language-pack
Imports
- process
from tree_sitter_language_pack import process, ProcessConfig
- get_language
from tree_sitter_language_pack import get_language, get_parser
Quickstart
from tree_sitter_language_pack import process, ProcessConfig
import os
source_code = """
def hello():
# This is a comment
print("Hello, World!")
class MyClass:
def __init__(self):
pass
"""
# Process source code for intelligence extraction (auto-downloads language if needed)
result = process(source_code, ProcessConfig(language="python"))
print(f"Functions found: {len(result.get('structure', []))}")
print(f"Diagnostics: {result.get('diagnostics', [])}")
print(f"Comments: {result.get('comments', [])}")
# Example with AST-aware chunking
chunked_result = process(
source_code,
ProcessConfig(language="python", chunk_max_size=50, comments=True)
)
print(f"Chunks found: {len(chunked_result.get('chunks', []))}")