llama-cpp-python: Python Bindings for llama.cpp

library 0.3.20 ·python

✓ verified May 23, 2026

Python bindings for the `llama.cpp` library, enabling efficient local inference of large language models (LLMs) on various hardware, including CPUs and GPUs (NVIDIA, Apple Metal, AMD ROCm). It provides both a high-level API for easy model interaction and a low-level API for direct C API access. The library is actively maintained with frequent updates, often mirroring upstream `llama.cpp` changes, and currently stands at version 0.3.20.

Traffic · last 30 days ↑33% vs prev 7d · indexed Tue Apr 14 · updated Fri May 29

total hits 16

actors 5 distinct systems

last hit 1d ago SERankingBot

Script

GPTBot

top countries 🇩🇪 Germany · 🇫🇷 France · 🇺🇸 United States · 🇮🇳 India · 🇨🇦 Canada

Resources

docsllama-cpp-python.readthedocs.io/en/latest/ ↗

githubgithub.com/abetlen/llama-cpp-python ↗

changelogllama-cpp-python.readthedocs.io/en/latest/changelog/ ↗

packagepypi.org/project/llama-cpp-python/ ↗

API endpoints

full doc /v1/registry/llama-cpp-python

install /v1/registry/llama-cpp-python/install

compatibility /v1/registry/llama-cpp-python/compatibility