GGUF Python Library
This is a Python package for writing binary files in the GGUF (GGML Universal File) format. It allows reading and writing of ML models, including metadata and tensors, for efficient inference with GGML-based frameworks like llama.cpp. The current version is 0.18.0, released on February 27, 2026, and the project has a regular release cadence, often aligned with updates from the upstream llama.cpp project.
Warnings
- breaking The GGUF format itself evolved from GGML to address backward compatibility issues, particularly regarding metadata. While the `gguf` Python library handles the GGUF format, users migrating older GGML models or interacting with different GGUF versions should be aware of the format's evolution and ensure compatibility between the file version and the library version, as GGUF introduced proper versioning and key-value lookup tables for metadata.
- gotcha The `gguf` Python package historically included a top-level `scripts/` directory, which could lead to `ImportError` issues if another installed package also used a top-level `scripts` module. This causes namespace conflicts in the Python environment.
- gotcha GGUF files embed extensive metadata, including chat templates and system instructions. Incorrect or mismatched templates between the GGUF file and the inference engine (e.g., `llama.cpp`, `vLLM`) can lead to poor model inference quality, such as gibberish, repeated outputs, or infinite generation loops.
- gotcha The `gguf` library is a utility for the GGUF format, which is actively developed by the `llama.cpp` project. New features or constants in the GGUF format (e.g., new quantization types like MXFP4) require corresponding updates to the `gguf` Python package. Using an outdated `gguf` package with a newer GGUF model file might lead to parsing errors, missing metadata, or unrecognised quantization types.
- gotcha GGUF files can contain 'poisoned' or malicious chat templates and system instructions that can subtly alter model behavior at inference time without direct model retraining. This poses a supply chain security risk.
Install
-
pip install gguf -
pip install gguf[gui]
Imports
- GGUFWriter
from gguf import GGUFWriter
- GGUFReader
from gguf import GGUFReader
- GGUFValueType
from gguf.constants import GGUFValueType
- GGMLQuantizationType
from gguf.constants import GGMLQuantizationType
Quickstart
import numpy as np
from gguf import GGUFWriter, GGUFReader, GGUFValueType
# --- Writing a GGUF file ---
output_file = "example.gguf"
arch = "example_arch"
writer = GGUFWriter(output_file, arch)
writer.add_block_count(12)
writer.add_uint32("answer", 42)
writer.add_string("author", "AI Agent")
# Add a tensor
tensor_name = "my_example_tensor"
tensor_data = np.ones((3, 4), dtype=np.float32) * 7.0
writer.add_tensor(tensor_name, tensor_data)
# Finalize and write the file
writer.write_header_to_file()
writer.write_kv_to_file()
writer.write_tensors_to_file()
writer.close()
print(f"Created GGUF file: {output_file}")
# --- Reading a GGUF file ---
reader = GGUFReader(output_file, 'r')
reader.read_header()
reader.read_kv()
print(f"\nReading {output_file}:")
print(f" GGUF Version: {reader.gguf_version}")
print(" Metadata:")
for key, value in reader.kv.items():
print(f" {key}: {value}")
print(" Tensors (names and shapes):")
for tensor in reader.tensors:
print(f" - {tensor.name}: {tensor.shape}, {tensor.ggml_type.name}")
# Example of loading a specific tensor (requires reading tensor data)
# reader.read_tensors() # Uncomment this to load all tensor data into memory
# if tensor_name in reader.tensors_by_name:
# loaded_tensor = reader.tensors_by_name[tensor_name]
# print(f" Loaded '{loaded_tensor.name}' data:\n{loaded_tensor.data}")
reader.close()