High-Performance Safetensors Model Loader
fastsafetensors is a Python library designed for high-performance loading of safetensors models, particularly optimized for GPU environments (CUDA, ROCm). It aims to offer faster loading times compared to the standard `safetensors` library for large models. The current version is `0.2.2`, and it maintains an active release cadence with frequent bug fixes and performance improvements.
Common errors
-
FileNotFoundError: [Errno 2] No such file or directory: 'non_existent_model.safetensors'
cause The path provided to `FastSafetensorsFile` does not point to an existing safetensors file.fixVerify that the file path is correct and the safetensors file exists at that location. -
KeyError: 'tensor_name_does_not_exist'
cause Attempting to access a tensor by a name that is not present in the loaded safetensors file.fixUse `fsf.get_tensors()` to list all available tensor names and their metadata in the file, then use a correct name. -
ModuleNotFoundError: No module named 'torch'
cause You are trying to load a tensor into a PyTorch `torch.Tensor` object, but PyTorch is not installed in your environment.fixInstall PyTorch: `pip install torch` (or `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121` for CUDA-enabled versions, adjusting for your specific CUDA version). -
RuntimeError: No CUDA GPUs are available
cause Attempting to load or process tensors on a CUDA device when no CUDA-enabled GPU is detected or properly configured, or when a CUDA-specific operation is attempted without a GPU.fixEnsure your system has a CUDA-enabled GPU, that the necessary NVIDIA drivers and CUDA toolkit are installed, and that PyTorch (or relevant framework) is installed with CUDA support. For ROCm, ensure the AMD ROCm platform is correctly set up.
Warnings
- gotcha `fastsafetensors` is designed for *loading* safetensors files efficiently, particularly on GPU. It does not provide functionality to *save* safetensors files. For saving, you should use the core `safetensors` library (e.g., `safetensors.torch.save_file`).
- gotcha `FastSafetensorsFile` implements lazy loading. Tensors are not fully loaded into memory when the file is opened or when `get_tensors()` is called. They are loaded only when accessed (e.g., `fsf['tensor_name']`). This design optimizes memory usage and startup time but might surprise users expecting eager loading.
- gotcha To leverage `fastsafetensors` for specific deep learning frameworks (PyTorch, TensorFlow, PaddlePaddle), those frameworks must be installed separately. `fastsafetensors` does not include them as direct dependencies but will convert loaded data into their respective tensor types if available.
- deprecated Older versions (0.2.0 and prior) had known issues with CUDA device initialization and stream synchronization, potentially leading to incorrect behavior or suboptimal performance in multi-GPU or complex asynchronous operations.
Install
-
pip install fastsafetensors
Imports
- FastSafetensorsFile
from fastsafetensors import FastSafetensorsFile
- LazyTensorFactory
from fastsafetensors.lazy_tensor_factory import LazyTensorFactory
Quickstart
import torch
from safetensors.torch import save_file
from fastsafetensors import FastSafetensorsFile
import os
# 1. Create a dummy safetensors file for demonstration
dummy_data = {
"layer1.weight": torch.randn(128, 64),
"layer1.bias": torch.zeros(128),
"layer2.weight": torch.ones(64, 32)
}
dummy_file_path = "dummy_model.safetensors"
save_file(dummy_data, dummy_file_path)
print(f"Created dummy safetensors file: {dummy_file_path}\n")
# 2. Load the safetensors file using FastSafetensorsFile
try:
fsf = FastSafetensorsFile(dummy_file_path)
# 3. Inspect tensor metadata (does not load data into memory)
print("Tensors available in the file (metadata only):")
for name, metadata in fsf.get_tensors().items():
print(f" - {name}: {metadata}")
# 4. Access a specific tensor (this triggers loading for that tensor)
tensor_name = "layer1.weight"
loaded_tensor = fsf[tensor_name]
print(f"\nSuccessfully loaded '{tensor_name}':")
print(f" Type: {type(loaded_tensor)}")
print(f" Shape: {loaded_tensor.shape}")
print(f" First 5 elements:\n{loaded_tensor.flatten()[:5]}\n")
# Access another tensor
print(f"Accessing 'layer2.weight' (shape: {fsf['layer2.weight'].shape})\n")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# 5. Clean up the dummy file
if os.path.exists(dummy_file_path):
os.remove(dummy_file_path)
print(f"Cleaned up dummy file: {dummy_file_path}")