GGUF Python Library

0.18.0 · active · verified Thu Apr 09

This is a Python package for writing binary files in the GGUF (GGML Universal File) format. It allows reading and writing of ML models, including metadata and tensors, for efficient inference with GGML-based frameworks like llama.cpp. The current version is 0.18.0, released on February 27, 2026, and the project has a regular release cadence, often aligned with updates from the upstream llama.cpp project.

Warnings

breaking The GGUF format itself evolved from GGML to address backward compatibility issues, particularly regarding metadata. While the `gguf` Python library handles the GGUF format, users migrating older GGML models or interacting with different GGUF versions should be aware of the format's evolution and ensure compatibility between the file version and the library version, as GGUF introduced proper versioning and key-value lookup tables for metadata.
Fix: Ensure that GGUF files are created with a compatible version and that the `gguf` Python library is up-to-date to handle the latest GGUF format features. For model conversions, use the recommended `llama.cpp` conversion scripts.
gotcha The `gguf` Python package historically included a top-level `scripts/` directory, which could lead to `ImportError` issues if another installed package also used a top-level `scripts` module. This causes namespace conflicts in the Python environment.
Fix: If encountering `ImportError: cannot import name '...' from 'scripts'`, check for `gguf` as a potential cause. A workaround might involve virtual environments or renaming conflicting local modules, or checking for a `gguf` package update that resolves this structure.
gotcha GGUF files embed extensive metadata, including chat templates and system instructions. Incorrect or mismatched templates between the GGUF file and the inference engine (e.g., `llama.cpp`, `vLLM`) can lead to poor model inference quality, such as gibberish, repeated outputs, or infinite generation loops.
Fix: Always ensure the chat template and `eos` (end-of-sequence) tokens configured in the GGUF file match those expected by the specific model and its inference engine. Consult the model's documentation for correct template usage.
gotcha The `gguf` library is a utility for the GGUF format, which is actively developed by the `llama.cpp` project. New features or constants in the GGUF format (e.g., new quantization types like MXFP4) require corresponding updates to the `gguf` Python package. Using an outdated `gguf` package with a newer GGUF model file might lead to parsing errors, missing metadata, or unrecognised quantization types.
Fix: Regularly update the `gguf` Python package to the latest version, especially when working with recently released GGUF models or `llama.cpp` builds, to ensure compatibility with the most current format features.
gotcha GGUF files can contain 'poisoned' or malicious chat templates and system instructions that can subtly alter model behavior at inference time without direct model retraining. This poses a supply chain security risk.
Fix: Exercise caution and verify the source and content of GGUF files, especially those from untrusted origins. Review embedded chat templates and system instructions before deploying models in sensitive applications.

Install

pip install gguf Basic Installation
pip install gguf[gui] Installation with GUI tools

Imports

GGUFWriter
```
from gguf import GGUFWriter
```
GGUFReader
```
from gguf import GGUFReader
```

GGUFValueType

from gguf.constants import GGUFValueType

GGMLQuantizationType

from gguf.constants import GGMLQuantizationType

Quickstart

This quickstart demonstrates how to create a simple GGUF file containing metadata and a tensor, and then how to read its header, key-value metadata, and tensor information using the `gguf` Python library. It highlights the core `GGUFWriter` and `GGUFReader` classes.

import numpy as np
from gguf import GGUFWriter, GGUFReader, GGUFValueType

# --- Writing a GGUF file ---
output_file = "example.gguf"
arch = "example_arch"

writer = GGUFWriter(output_file, arch)
writer.add_block_count(12)
writer.add_uint32("answer", 42)
writer.add_string("author", "AI Agent")

# Add a tensor
tensor_name = "my_example_tensor"
tensor_data = np.ones((3, 4), dtype=np.float32) * 7.0
writer.add_tensor(tensor_name, tensor_data)

# Finalize and write the file
writer.write_header_to_file()
writer.write_kv_to_file()
writer.write_tensors_to_file()
writer.close()
print(f"Created GGUF file: {output_file}")

# --- Reading a GGUF file ---
reader = GGUFReader(output_file, 'r')
reader.read_header()
reader.read_kv()

print(f"\nReading {output_file}:")
print(f"  GGUF Version: {reader.gguf_version}")
print("  Metadata:")
for key, value in reader.kv.items():
    print(f"    {key}: {value}")

print("  Tensors (names and shapes):")
for tensor in reader.tensors:
    print(f"    - {tensor.name}: {tensor.shape}, {tensor.ggml_type.name}")

# Example of loading a specific tensor (requires reading tensor data)
# reader.read_tensors() # Uncomment this to load all tensor data into memory
# if tensor_name in reader.tensors_by_name:
#     loaded_tensor = reader.tensors_by_name[tensor_name]
#     print(f"  Loaded '{loaded_tensor.name}' data:\n{loaded_tensor.data}")

reader.close()

view raw JSON →