{"id":1491,"library":"gguf","title":"GGUF Python Library","description":"This is a Python package for writing binary files in the GGUF (GGML Universal File) format. It allows reading and writing of ML models, including metadata and tensors, for efficient inference with GGML-based frameworks like llama.cpp. The current version is 0.18.0, released on February 27, 2026, and the project has a regular release cadence, often aligned with updates from the upstream llama.cpp project.","status":"active","version":"0.18.0","language":"en","source_language":"en","source_url":"https://github.com/ggml-org/llama.cpp","tags":["ml","model-format","ggml","llama.cpp","quantization","ai","binary-format"],"install":[{"cmd":"pip install gguf","lang":"bash","label":"Basic Installation"},{"cmd":"pip install gguf[gui]","lang":"bash","label":"Installation with GUI tools"}],"dependencies":[{"reason":"Runtime requirement","package":"python","optional":false},{"reason":"Used for tensor data handling","package":"numpy","optional":false},{"reason":"Required for the optional 'gui' features like gguf_editor_gui.py","package":"PyQt6","optional":true}],"imports":[{"symbol":"GGUFWriter","correct":"from gguf import GGUFWriter"},{"symbol":"GGUFReader","correct":"from gguf import GGUFReader"},{"symbol":"GGUFValueType","correct":"from gguf.constants import GGUFValueType"},{"symbol":"GGMLQuantizationType","correct":"from gguf.constants import GGMLQuantizationType"}],"quickstart":{"code":"import numpy as np\nfrom gguf import GGUFWriter, GGUFReader, GGUFValueType\n\n# --- Writing a GGUF file ---\noutput_file = \"example.gguf\"\narch = \"example_arch\"\n\nwriter = GGUFWriter(output_file, arch)\nwriter.add_block_count(12)\nwriter.add_uint32(\"answer\", 42)\nwriter.add_string(\"author\", \"AI Agent\")\n\n# Add a tensor\ntensor_name = \"my_example_tensor\"\ntensor_data = np.ones((3, 4), dtype=np.float32) * 7.0\nwriter.add_tensor(tensor_name, tensor_data)\n\n# Finalize and write the file\nwriter.write_header_to_file()\nwriter.write_kv_to_file()\nwriter.write_tensors_to_file()\nwriter.close()\nprint(f\"Created GGUF file: {output_file}\")\n\n# --- Reading a GGUF file ---\nreader = GGUFReader(output_file, 'r')\nreader.read_header()\nreader.read_kv()\n\nprint(f\"\\nReading {output_file}:\")\nprint(f\"  GGUF Version: {reader.gguf_version}\")\nprint(\"  Metadata:\")\nfor key, value in reader.kv.items():\n    print(f\"    {key}: {value}\")\n\nprint(\"  Tensors (names and shapes):\")\nfor tensor in reader.tensors:\n    print(f\"    - {tensor.name}: {tensor.shape}, {tensor.ggml_type.name}\")\n\n# Example of loading a specific tensor (requires reading tensor data)\n# reader.read_tensors() # Uncomment this to load all tensor data into memory\n# if tensor_name in reader.tensors_by_name:\n#     loaded_tensor = reader.tensors_by_name[tensor_name]\n#     print(f\"  Loaded '{loaded_tensor.name}' data:\\n{loaded_tensor.data}\")\n\nreader.close()\n","lang":"python","description":"This quickstart demonstrates how to create a simple GGUF file containing metadata and a tensor, and then how to read its header, key-value metadata, and tensor information using the `gguf` Python library. It highlights the core `GGUFWriter` and `GGUFReader` classes."},"warnings":[{"fix":"Ensure that GGUF files are created with a compatible version and that the `gguf` Python library is up-to-date to handle the latest GGUF format features. For model conversions, use the recommended `llama.cpp` conversion scripts.","message":"The GGUF format itself evolved from GGML to address backward compatibility issues, particularly regarding metadata. While the `gguf` Python library handles the GGUF format, users migrating older GGML models or interacting with different GGUF versions should be aware of the format's evolution and ensure compatibility between the file version and the library version, as GGUF introduced proper versioning and key-value lookup tables for metadata.","severity":"breaking","affected_versions":"<= 0.17.x (related to format evolution before stable GGUF)"},{"fix":"If encountering `ImportError: cannot import name '...' from 'scripts'`, check for `gguf` as a potential cause. A workaround might involve virtual environments or renaming conflicting local modules, or checking for a `gguf` package update that resolves this structure.","message":"The `gguf` Python package historically included a top-level `scripts/` directory, which could lead to `ImportError` issues if another installed package also used a top-level `scripts` module. This causes namespace conflicts in the Python environment.","severity":"gotcha","affected_versions":"Potentially all versions prior to a fix (if implemented). Issue reported in 0.18.0 context."},{"fix":"Always ensure the chat template and `eos` (end-of-sequence) tokens configured in the GGUF file match those expected by the specific model and its inference engine. Consult the model's documentation for correct template usage.","message":"GGUF files embed extensive metadata, including chat templates and system instructions. Incorrect or mismatched templates between the GGUF file and the inference engine (e.g., `llama.cpp`, `vLLM`) can lead to poor model inference quality, such as gibberish, repeated outputs, or infinite generation loops.","severity":"gotcha","affected_versions":"All versions where GGUF files are used for inference."},{"fix":"Regularly update the `gguf` Python package to the latest version, especially when working with recently released GGUF models or `llama.cpp` builds, to ensure compatibility with the most current format features.","message":"The `gguf` library is a utility for the GGUF format, which is actively developed by the `llama.cpp` project. New features or constants in the GGUF format (e.g., new quantization types like MXFP4) require corresponding updates to the `gguf` Python package. Using an outdated `gguf` package with a newer GGUF model file might lead to parsing errors, missing metadata, or unrecognised quantization types.","severity":"gotcha","affected_versions":"Any version lagging behind the upstream `llama.cpp` GGUF format specification."},{"fix":"Exercise caution and verify the source and content of GGUF files, especially those from untrusted origins. Review embedded chat templates and system instructions before deploying models in sensitive applications.","message":"GGUF files can contain 'poisoned' or malicious chat templates and system instructions that can subtly alter model behavior at inference time without direct model retraining. This poses a supply chain security risk.","severity":"gotcha","affected_versions":"All versions, as this is a format-level vulnerability when consuming untrusted files."}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}