GGUF Python Library
This is a Python package for writing binary files in the GGUF (GGML Universal File) format. It allows reading and writing of ML models, including metadata and tensors, for efficient inference with GGML-based frameworks like llama.cpp. The current version is 0.18.0, released on February 27, 2026, and the project has a regular release cadence, often aligned with updates from the upstream llama.cpp project.
Common errors
-
ModuleNotFoundError: No module named 'gguf'
cause The 'gguf' Python package is not installed in the active Python environment.fixInstall the package using pip: `pip install gguf` or `python -m pip install gguf` if using an embedded Python environment. -
AttributeError: 'GGUFWriter' object has no attribute '...' (e.g., 'get_total_parameter_count' or 'add_vocab_size')
cause This error typically occurs when the code is trying to call a method or access an attribute on a `GGUFWriter` object that does not exist in the installed version of the `gguf` library, often due to an outdated or incompatible version of `gguf` with the conversion script being used (e.g., from `llama.cpp`).fixUpdate the `gguf` package to the latest version: `pip install --upgrade gguf`. If the issue persists, ensure the `gguf` library version is compatible with the specific conversion script or framework you are using, or consult the `gguf` project's documentation for API changes. -
KeyError: 'general.name' or KeyError: '__EOS_TOKEN__'
cause This error happens when attempting to access a specific metadata key (e.g., 'general.name' or token-related keys like '__EOS_TOKEN__') that is either missing or named differently in the GGUF file or the model's configuration being processed. This is common when converting or loading models from various sources, as metadata structures can vary.fixInspect the GGUF file's metadata using `gguf.GGUFReader` to verify the actual keys present and adjust your code to use the correct metadata keys, or ensure the model conversion process correctly populates the expected metadata fields. -
ValueError: Trying to set a tensor of shape torch.Size(...) in "..." (which has shape torch.Size(...)), this look incorrect.
cause This error indicates a mismatch in tensor shapes when trying to load or process a GGUF model, where the expected shape for a tensor (e.g., in a model's state dictionary) does not match the actual shape of the tensor read from the GGUF file. This can be caused by inconsistencies between model architectures, unexpected quantization, or corruption.fixVerify the compatibility between the GGUF model file and the model architecture definition you are using. If converting, ensure the conversion script correctly handles tensor shapes and types for the specific model. Sometimes, using a different version of the conversion tool or the target framework (e.g., `transformers`) can resolve such incompatibilities.
Warnings
- breaking The GGUF format itself evolved from GGML to address backward compatibility issues, particularly regarding metadata. While the `gguf` Python library handles the GGUF format, users migrating older GGML models or interacting with different GGUF versions should be aware of the format's evolution and ensure compatibility between the file version and the library version, as GGUF introduced proper versioning and key-value lookup tables for metadata.
- gotcha The `gguf` Python package historically included a top-level `scripts/` directory, which could lead to `ImportError` issues if another installed package also used a top-level `scripts` module. This causes namespace conflicts in the Python environment.
- gotcha GGUF files embed extensive metadata, including chat templates and system instructions. Incorrect or mismatched templates between the GGUF file and the inference engine (e.g., `llama.cpp`, `vLLM`) can lead to poor model inference quality, such as gibberish, repeated outputs, or infinite generation loops.
- gotcha The `gguf` library is a utility for the GGUF format, which is actively developed by the `llama.cpp` project. New features or constants in the GGUF format (e.g., new quantization types like MXFP4) require corresponding updates to the `gguf` Python package. Using an outdated `gguf` package with a newer GGUF model file might lead to parsing errors, missing metadata, or unrecognised quantization types.
- gotcha GGUF files can contain 'poisoned' or malicious chat templates and system instructions that can subtly alter model behavior at inference time without direct model retraining. This poses a supply chain security risk.
- breaking The `GGUFWriter` class in the `gguf` Python library does not have a method named `write_kv_to_file`. This indicates an attempt to call a non-existent method on the `GGUFWriter` object, leading to an `AttributeError`.
- breaking Installing `gguf` or its dependencies (e.g., `sentencepiece`) on minimal Linux distributions like Alpine may fail due to missing system build tools. Packages with C/C++ extensions require compilers and build system tools (like `cmake`, `pkg-config`, `build-base`) to be present in the environment during installation.
Install
-
pip install gguf -
pip install gguf[gui]
Imports
- GGUFWriter
from gguf import GGUFWriter
- GGUFReader
from gguf import GGUFReader
- GGUFValueType
from gguf.constants import GGUFValueType
- GGMLQuantizationType
from gguf.constants import GGMLQuantizationType
Quickstart
import numpy as np
from gguf import GGUFWriter, GGUFReader, GGUFValueType
# --- Writing a GGUF file ---
output_file = "example.gguf"
arch = "example_arch"
writer = GGUFWriter(output_file, arch)
writer.add_block_count(12)
writer.add_uint32("answer", 42)
writer.add_string("author", "AI Agent")
# Add a tensor
tensor_name = "my_example_tensor"
tensor_data = np.ones((3, 4), dtype=np.float32) * 7.0
writer.add_tensor(tensor_name, tensor_data)
# Finalize and write the file
writer.write_header_to_file()
writer.write_kv_to_file()
writer.write_tensors_to_file()
writer.close()
print(f"Created GGUF file: {output_file}")
# --- Reading a GGUF file ---
reader = GGUFReader(output_file, 'r')
reader.read_header()
reader.read_kv()
print(f"\nReading {output_file}:")
print(f" GGUF Version: {reader.gguf_version}")
print(" Metadata:")
for key, value in reader.kv.items():
print(f" {key}: {value}")
print(" Tensors (names and shapes):")
for tensor in reader.tensors:
print(f" - {tensor.name}: {tensor.shape}, {tensor.ggml_type.name}")
# Example of loading a specific tensor (requires reading tensor data)
# reader.read_tensors() # Uncomment this to load all tensor data into memory
# if tensor_name in reader.tensors_by_name:
# loaded_tensor = reader.tensors_by_name[tensor_name]
# print(f" Loaded '{loaded_tensor.name}' data:\n{loaded_tensor.data}")
reader.close()