Torch Memory Saver
Torch Memory Saver is a PyTorch library designed to optimize GPU memory usage by allowing `torch` tensor memory to be temporarily released and resumed later. It enables developers to manage memory more efficiently, especially for large models or when performing operations that might exceed available VRAM. The library is actively developed, with its latest stable release, version 0.0.9, released in October 2025.
Warnings
- gotcha By default, calling `torch_memory_saver.pause()` discards the content of the tensors in the region to maximize memory savings. If you need to preserve the tensor content for later use, you must instantiate the memory region with `with torch_memory_saver.region(enable_cpu_backup=True):`.
- gotcha The library operates by hooking into CUDA's memory allocation (either via `LD_PRELOAD` or PyTorch's custom allocator). This low-level intervention might conflict with other libraries or debugging tools that also modify CUDA memory behavior, potentially leading to unexpected errors or instability.
- gotcha When utilizing PyTorch's CUDA Graph feature for performance optimization, you must replace `torch.cuda.graph(...)` with `torch_memory_saver.cuda_graph(...)`. This ensures compatibility with the memory saver and allows the release of intermediate tensor memory within the graph, preventing memory accumulation.
- gotcha While `torch-memory-saver` helps manage tensor memory, it does not resolve all general PyTorch memory issues. Developers should still follow best practices such as detaching tensors from the computation graph (`.detach()`) when they are not needed for gradients, using `torch.no_grad()` for inference, and explicitly deleting unused objects (`del var; gc.collect(); torch.cuda.empty_cache()`) to prevent other types of memory leaks.
Install
-
pip install torch-memory-saver
Imports
- torch_memory_saver
import torch_memory_saver
- region
from torch_memory_saver import region
- pause
torch_memory_saver.pause()
- resume
torch_memory_saver.resume()
Quickstart
import torch
import torch_memory_saver
import os
if not torch.cuda.is_available():
print("CUDA is not available. This library is designed for GPU memory saving.")
exit()
print(f"Initial CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")
# 1. For tensors that want to be paused, create them within `region`
with torch_memory_saver.region():
# Create a large tensor (adjust size based on your GPU memory)
pauseable_tensor = torch.full((1_000_000_000,), 100, dtype=torch.uint8, device="cuda") # ~1GB
print(f"Tensor created. Current CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")
# 2. Temporarily pause memory for tensors in this region
# By default, content is thrown away. Use `enable_cpu_backup=True` to preserve content.
torch_memory_saver.pause()
print(f"Memory paused. Current CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")
# At this point, `nvidia-smi` would show reduced GPU memory usage for the process.
# You can perform other memory-intensive operations here.
# 3. After `resume`, CUDA memory is re-occupied for those tensors.
torch_memory_saver.resume()
print(f"Memory resumed. Current CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")
# If `enable_cpu_backup=True` was used, you could now access `pauseable_tensor` and its content would be intact.
# print(f"Tensor element value after resume (if backed up): {pauseable_tensor[0].item()}")
# Ensure to delete tensors and clear cache if running multiple experiments in a single script
del pauseable_tensor
if torch.cuda.is_available():
torch.cuda.empty_cache()
print(f"Final CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")