GGUF Python Library

0.18.0 · active · verified Thu Apr 09

This is a Python package for writing binary files in the GGUF (GGML Universal File) format. It allows reading and writing of ML models, including metadata and tensors, for efficient inference with GGML-based frameworks like llama.cpp. The current version is 0.18.0, released on February 27, 2026, and the project has a regular release cadence, often aligned with updates from the upstream llama.cpp project.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a simple GGUF file containing metadata and a tensor, and then how to read its header, key-value metadata, and tensor information using the `gguf` Python library. It highlights the core `GGUFWriter` and `GGUFReader` classes.

import numpy as np
from gguf import GGUFWriter, GGUFReader, GGUFValueType

# --- Writing a GGUF file ---
output_file = "example.gguf"
arch = "example_arch"

writer = GGUFWriter(output_file, arch)
writer.add_block_count(12)
writer.add_uint32("answer", 42)
writer.add_string("author", "AI Agent")

# Add a tensor
tensor_name = "my_example_tensor"
tensor_data = np.ones((3, 4), dtype=np.float32) * 7.0
writer.add_tensor(tensor_name, tensor_data)

# Finalize and write the file
writer.write_header_to_file()
writer.write_kv_to_file()
writer.write_tensors_to_file()
writer.close()
print(f"Created GGUF file: {output_file}")

# --- Reading a GGUF file ---
reader = GGUFReader(output_file, 'r')
reader.read_header()
reader.read_kv()

print(f"\nReading {output_file}:")
print(f"  GGUF Version: {reader.gguf_version}")
print("  Metadata:")
for key, value in reader.kv.items():
    print(f"    {key}: {value}")

print("  Tensors (names and shapes):")
for tensor in reader.tensors:
    print(f"    - {tensor.name}: {tensor.shape}, {tensor.ggml_type.name}")

# Example of loading a specific tensor (requires reading tensor data)
# reader.read_tensors() # Uncomment this to load all tensor data into memory
# if tensor_name in reader.tensors_by_name:
#     loaded_tensor = reader.tensors_by_name[tensor_name]
#     print(f"  Loaded '{loaded_tensor.name}' data:\n{loaded_tensor.data}")

reader.close()

view raw JSON →